Slashdot Mirror


Automated Tiered Storage Coming to Desktops?

roj3 writes "Tiered storage has been the scourge of administrators because the vendors tell us to hold meetings with all departments and then classify data to storage tier based on its type or relative importance. eWeek has a story about a new approach to tiered storage — sorting it all by usage patterns. Regularly used data goes on high-performance storage, idle data goes on slower/cheaper storage. Volumes and files even span several types of drives or RAID levels. Is automated tiered storage headed to desktops?"

30 of 110 comments (clear)

  1. Networks, sure. by celardore · · Score: 5, Insightful

    I can see the usefulness of this technology over a busy network with multiple users and masses of files and storage... I just can't see needing anything more than a mirror&stripe RAID array on a PC with only one user. Even that could be considered excessive.

    1. Re:Networks, sure. by dsginter · · Score: 4, Interesting

      I think we'll actually see the opposite:

      With multiple PCs per household, it makes sense to get rid of the hard drives at the PC level and put them in a RAID enclose that is secured into a wall.

      This, however, is a threat to Microsoft because you'll be able to PXE-boot any image of your choice (just think that perhaps your employer or bank supplies their own secure image in order to connect to their resources). Someone needs to get Windows to PXE boot at the hardware level (emulate IDE or something).

      This will be huge but we've got to squeeze Microsoft into it, first. Then, everyone will be free to try linux and see what we've all been jabbering about.

      --
      More
    2. Re:Networks, sure. by 0racle · · Score: 4, Insightful

      That only makes sense if the people in the household wish to learn how to use what you've mentioned. Since current evidence points to the fact that most people look at computers as a magical box that can not be understood, the chances of them learning how to do a fraction of what you suggest is about as likely as you winning the lottery.

      The XP file sharing wizard is too much for a lot of people and you think a raid array sharing up OS images over a network via PXE makes sense?

      --
      "I use a Mac because I'm just better than you are."
    3. Re:Networks, sure. by jwjcmw · · Score: 4, Insightful

      "Life is changing to the digital a bit more evey day. And just as we have cardboard boxes in our attic holding the things we dont use, file cabinets in our office alphabetized, firesafes for important documents, and Safe Deposit boxes for wills. The average home user will need to know and use the digital equivalents."

      Or, if you are like many people, you have documents on your desk and in piles on the floor that you will never use, your kids birth certificate is in a stack of papers from when you had to take it to school for registration, your file cabinets have partially labeled folders that are in chronological order...as in the order that you stuffed them in the filing cabinet, your will is in the "to be filed" folder in the bottom of said filing cabinet and you could fill the bathtub with your old phone and electric bills.

      Hopefully the digital equivalents will be better for the organizationally challenged.

  2. Great Idea by Jazz-Masta · · Score: 5, Insightful

    This is exactly what everyone is looking for. People defrag their hard drives in the hopes to increase performance. There is no reason why storage that is accessed more shouldn't be on the high performance drives. Or at least some sort of class rating that defines what storage may need high performance. For example, automatically installing and saving 3D Max to a RAID 0 media, and saving word documents to the lesser-performing drives.

    I try to follow this idea all the time with my system. Fast stuff goes on RAID 0, slow stuff, and backup stuff goes on the ole' 200 GB backup drive.

    1. Re:Great Idea by mollog · · Score: 4, Informative

      Hewlett-Packard Company developed a product that did this automagically. It was an external RAID system that connected via one or two SCSI busses to a host. All incoming data was stored in RAID 0/1; striped and mirrored. (aka RAID 6 and RAID 10). As the storage filled up, unused data was automagically migrated to more space-efficient RAID 5. Data that had been accessed recently remained in RAID 0/1. You could add disk drives and it would automagically include the drives (but you would have to use LVM or other utilities in the OS to increase its file system.) You could mix two drive sizes, say, 18GB and 36GB, without trouble. If a drive failed, the array would rebuild reduncancy. If another drive failed, ditto. It was fast, it was fully redundant.

      But it was a lot smarter than the admins who had to use it so it wasn't very popular.

      --
      Best regards.
    2. Re:Great Idea by pla · · Score: 3, Insightful

      This is exactly what everyone is looking for.

      No.

      You (and a number of other posters on this topic) have described what we look for - Geeks who want to get the most out of their systems with the least expense. If I could get killer performance with a RAID0 of tiny but fast drives (think Raptors, or even Cheetahs if you don't mind dealing with SCSI), while still having the capacity of a cheap 400GB IDE drive - Of course I'd have such a setup (and in fact, many of us already do, we just manually transfer things to/from the big-n'-slow).

      Most people, however, do not want this. For starters, most people don't even need the huge drives they already have - If you gave them just the pair of RAID0 36GBs, they'd never use even half that the capacity, so no need for ever moving files to the slow storage. Then failing that, the members of the Sixpack family that manage to store hundreds of GB only fill it with downloaded porn, music, and movies - Uses that really don't need fast drives, just tons of space.


      So while it sounds useful in theory - in practice, such a setup would just add cost and complexity without providing any tangible benefit to most users. I suspect even most Geek users would rarely notice the difference (aside from OS load times), and would only make such a setup for bragging rights.

  3. Already have teirs... by Kaenneth · · Score: 3, Insightful

    Registers, CPU cache, on-chip cache, RAM, local disk, Network/Removable Media, Paper/Human memory...

    It's all about feeding that data hungry CPU, as quickly as possible.

  4. Not so new... by Duncan3 · · Score: 4, Interesting

    I was using systems that did this 10 years ago. Granted, back then it was disk+tape not different speed disks, but it's the exact same thing.

    Looks to me like an excuse to charge 8-10x what you should be paying for storage of that size.

    --
    - Adam L. Beberg - The Cosm Project - http://www.mithral.com/
    1. Re:Not so new... by hotrodman · · Score: 3, Insightful


        No kidding. So they find a way to put less-used data on slower disks, that still COST NEARLY AS MUCH. The entry price is still listed as $50,000. Big fuckin' deal. Let me know when you take a bunch of garden-variety servers, and do this, with the super cheap clone raid server with 40 terabytes of SATA as the 'last tier' for slowest files, where I can build 100 terabytes for $50,000.

          And yet, managers will get a woody over this buzzword compliance and want to give these guys millions to have the 'latest and greatest'.

          And have it still work with tape, too, and not tied up in some cumbersome, proprietary protocol owned by one little company that could go out business.

    2. Re:Not so new... by dpilot · · Score: 4, Informative

      It was called HSM, (Hierarchical Storage Management) it ran on IBM's MVS on mainframes, and it moved your less-used data to cheaper storage, in several stages. IIRC, the first stage was just compression on a different disk, the second stage was a tapes in a jukebox-type thing, and the third stage was tapes that an operator fetched and loaded. Somewhere way back there, data never used for 5 years fell off the end of the belt, but you got warned, first.

      The day after vacation, when you kept getting the message, "DFHSM is recalling dataset xyz for user jkl" as it pulled all of your storage back online was a pain, and we all thought it would be neat to get rid of, as we migrated to workstations. But in retrospect, HSM was great, never having to worry about your data quantity. That's compared with having to root through $HOME every few months to take care of quota problems.

      --
      The living have better things to do than to continue hating the dead.
    3. Re:Not so new... by Doctor+Memory · · Score: 3, Informative
      you had something automated that determined where the files should go and moved them appropriately? It analyzed usage patterns?


      Oh yeah. BITD, there was the archiver, a job that ran every night and moved files that hadn't been accessed in the last N time periods to tape. It left the VTOC entry (kind of like an inode), just marked it "archived" and the label of the tape. Then, the next time that file was accessed, a hook in the open() call would send a message to the console operator telling them to mount tape such-and-such. When the tape was mounted, the archiver would automatically copy the file back into place, the open() call would complete normally, and life was good. Basically transparent to the user (they'd look at their directory and all their files would be there), except for the fact that the file open would take two-three minutes. Then again, since they were paying for disk storage by the block-day, they were generally pretty happy to only pay for a fifty-cent tape mount every quarter instead of keeping that 1200-block file on-line for three months when they weren't using it.
      --
      Just junk food for thought...
  5. Oh....good.. by JerBear0 · · Score: 5, Insightful

    "idle data goes on slower/cheaper storage"

    So that special little something that you need once a year, but when you need it, you need it RIGHT NOW is tied to the foot of a pigeon fluttering around the warehouse somewhere. Frequency of use does NOT denote importance.

    --
    Bad experience is a school that only fools keep going to.
    1. Re:Oh....good.. by Red+Flayer · · Score: 3, Informative

      That's what metatagging is for. Tag files that are not to be moved to slow storage no matter how infrequently they are accessed. RTFA.

      --
      "Trolls they were, but filled with the evil will of their master: a fell race..." -- J.R.R. Tolkien on Olog-hai
    2. Re:Oh....good.. by Kadin2048 · · Score: 3, Insightful

      Frequency of use doesn't denote importance, but it might denote how quickly you need to be able to recall it. Similarly, importance doesn't imply that quick recall is necessary. If you don't use something frequently, it might be okay to store it somewhere that takes a while to recall from, even if it is "important," as long as you know where it is so that you can get it back.

      As an example, financial records for past years might be very important, but you don't need to be able to access them in a tenth of a second. As long as you can get to them if you really want to (sacrificing a few seconds), then it's all right.

      The way I see this translating to reality is that you'd keep all your old documents in slow-speed storage, but then keep an index in high-speed storage, so that you could easily search (both by name and by content) and decide when to pull stuff out of your archives.

      This is no different than what people have been doing for centuries with paper. Just because the card catalog is located in the center of the library doesn't mean its contents are inherently more valuable than the actual books (which might be in the basement, back shelves, wherever); it just means that the catalog gets accessed much more often.

      Actually, in the physical world, people often exchange speed of recall for certainty of recall. You put important documents in a safe-deposit box, rather than your kitchen counter, because even though it'll take you longer to get them out of the box, they're guaranteed to be there when you need them. Likewise, a system which traded off speed for redundancy would probably be appropriate for "important" but infrequently-accessed electronic documents.

      --
      "Ladies and gentlemen, my killbot features Lotus Notes and a machine gun. It is the finest available."
  6. No but it is correlated by davidwr · · Score: 3, Insightful

    Apply "frequency of use = urgency" to BIGNUM pieces of data and you will have a very useful albeit sub-optimal algorithm.

    Yes, there are exceptional cases, like the President's access to the Nuclear Briefcase. It hasn't been used for real in a long time if ever but when he needs it it had better be close at hand. However, these special cases can be treated as the special cases they are.

    --
    Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
    1. Re:No but it is correlated by cperciva · · Score: 4, Insightful

      Yes, there are exceptional cases, like the President's access to the Nuclear Briefcase. It hasn't been used for real in a long time if ever but when he needs it it had better be close at hand.

      Oddly enough, I think most people in the world would prefer that it wasn't close at hand when Bush decides he wants it.

      A better example is fire extinguishers -- most of them will literally never be used, but there's a very good reason to ensure that they are readily available.

    2. Re:No but it is correlated by CosmeticLobotamy · · Score: 3, Funny

      A better example is fire extinguishers -- most of them will literally never be used

      If that's the case for you, then I feel sorry for you. You've apparently never known the snowy, probably-toxic joy of Fire Extinguisher Expiration Day. It's the happiest day of the decade.

  7. IDE Neutrality? by sseaman · · Score: 5, Funny

    From its beginnings, the Hard Drive has leveled the playing field for all files. Everday files can have their content read by thousands, even millions of processes.

    The Coalition of Unused Files believes that the desktop is a crucial engine for personal and economic growth. They are working together to urge System Admins to preserve IDE Neutrality, the First Amendment for the Desktop Hard Drive that ensures that the Desktop remains open to innovation and progress.

  8. This is "new"? by Medievalist · · Score: 3, Insightful


    IBM mainframes that literally pumped water were doing this decades ago.

    What, you say water cooling is coming back too?

  9. It already is by malraid · · Score: 3, Insightful

    That's why you have HDD with cache. That's the whole concept of "virtual memory". The next step might be hybrid hdds (solid state / mag platters). But I don't think it will go much farther than that. Multiple raids is overkill for the average desktop.

    --
    please excuse my apathy
  10. Just read TFA: by Ant+P. · · Score: 4, Insightful

    $50k for a 6TB fileserver? What's that extra $40000 paying for that a normal fileserver loaded with RAM can't do just as fast?

    1. Re:Just read TFA: by Anonymous Coward · · Score: 3, Interesting

      Apples and pomegranates you compare;
      Channels of Fiber come not cheap.
      Terabytes 6 with connection of light for less than $50k you will not find.
      Terabytes 6 with connections of wire you may.
      SATA drives, untested are delivered.
      SATA drives with fewer bearings.
      SATA drives with short life.
      Enterprise storage is not easy.

  11. Just like my kitchen by Red+Flayer · · Score: 5, Funny

    Cheetos go in the easy-to-reach cabinet next to the fridge.

    Beer goes in the fornt on the top shelf of the fridge, milk (eventually cheese, typically) goes on the bottom shelf in the back.

    This is automated, since I simply shove things onto the shelves when I get home from the supermarket. Anything I consume and replace ends up at the front. Anything I buy because I 'should' be eating it (like fiber biscuits, or whatever) ends up pushed to the back.

    It's automated via metatag, too. Anything tagged 'ice cream' goes in the door of the freezer, anything tagged 'vegetable' gets relegated somewhere in the back, where it quickly develops an inch of ice crystals, to slowly dry out to a freezer-burnt state of suspended animation until I buy a new fridge unit.

    This costs no more than regular kitchen storage space, but if you'd like a custom design for you and your loved ones, my consulting fee is $75/hr, or a bag of chips and a six-pack.

    --
    "Trolls they were, but filled with the evil will of their master: a fell race..." -- J.R.R. Tolkien on Olog-hai
  12. Yes, Kinda... by ThinkFr33ly · · Score: 4, Informative

    Automatic tiered storage is definitely coming, but probably not in the form of multiple disks that run at different speeds or RAID levels.

    Microsoft announced a while back that Windows Vista would support three technologies designed to improve disk speed called SuperFetch, ReadyBoost, and ReadyDrive. SuperFetch is simply a way of preloading applications and data when the OS anticipates that you'll be loading those soon.

    ReadyBoost and ReadyDrive both utilize persistent memory caches to speed up access to the disk.

    ReadyBoost treats normal USB keys and flash disks like temporary caching locations for data from the disk.

    ReadyDrive is essentially the term Microsoft uses to described their support for hybrid hard drives, which are disks that have a built in flash memory module that's used as a persistent cache.

    Not only do hybrid disks dramatically increase performance, but they also result in huge power savings for mobile devices like laptops and media players.

  13. Liars, damn liars, and statistics by Medievalist · · Score: 4, Informative
    Decades ago, we used to laugh at the mainframers and their automated hierarchical storage systems because they'd make exactly these kinds of statements.

    Frequency of use DOES denotes importance, at the very least STATISTICALLY.
    No. Absent other data, it only denotes frequency of use, period. Playboy.com gets more hits than the general ledger webapp if you unblock your company firewall, but the general ledger is more important to the company.

    Just because you want "that special little something" once a year; does not mean you can degrade the speed of information which is instantly needed.
    There is actually very little correlation between what the average user wants and what s/he needs, as is empirically obvious. If the image from the "fly-fishing.com" website that they've set to come up as their background image every morning fails to load, they can still work, but if the once-a-year corporate audit checklist gets put on slow, old storage and then gets lost in a hardware failure, the company stock price may flutter and certainly heads will roll in the corporate IS department.

    This is an obvious fact
    I don't think that word means what you think it means.
  14. Hot File Adaptive Clustering? by Kadin2048 · · Score: 3, Informative
    I have heard mention of this as well, but I'd never seen any details. I tried to dig up some information; here's what I found.

    Apple's "About disk optimization with Mac OS X" (basically telling you that you don't need to defrag), says "Mac OS X 10.2 and later includes delayed allocation for Mac OS X Extended-formatted volumes. This allows a number of small allocations to be combined into a single large allocation in one area of the disk." ... "Mac OS X 10.3 Panther can also automatically defragment such slow-growing files [that data is continually appended to]. This process is sometimes known as "Hot-File-Adaptive-Clustering.""

    There's also a reference to a "hot band," a region of the drive where data is written that's used during startup, in order to increase performance and I assume lessen boot times.

    There's also reference to some automatic defragging in this macosxhints article on HFAC:
    There are 2 separate file optimizations going on here.

    The first is automatic file defragmentation. When a file is opened, if it is highly fragmented (8+ fragments) and under 20MB in size, it is defragmented. This works by just moving the file to a new, arbitrary, location. This only happens on Journaled HFS+ volumes.

    The second is the "Adaptive Hot File Clustering". Over a period of days, the OS keeps track of files that are read frequently - these are files under 10MB, and which are never written to. At the end of each tracking cycle, the "hottest" files (the files that have been read the most times) are moved to a "hotband" on the disk - this is a part of the disk which is particularly fast given the physical disk characteristics (currently sized at 5MB per GB). "Cold" files are evicted to make room. As a side effect of being moved into the hotband, files are defragmented. Currently, AHFC only works on the boot volume, and only for Journaled HFS+ volumes over 10GB.
    So that seems to be the deal; if anyone else has more information, I'd be interested to hear about it.

    There's also a MacSlash article on HFAC and a discussion on Ars that includes a post of the source code.
    --
    "Ladies and gentlemen, my killbot features Lotus Notes and a machine gun. It is the finest available."
  15. Re:Coming full circle -- good idea! by ratboy666 · · Score: 3, Informative

    And my favorite commands on the ol' HP-2000 mini:

    SANCTIFY and DESECRATE

    "Sanctify file" moved the file to drum (basically, one-drive RAID 0 for all you young-uns). Desecrate moved it to the regular hard disk.

    YMMV

    Ratboy

    --
    Just another "Cubible(sic) Joe" 2 17 3061
  16. Re:Certainly could be done in a desktop by COMON$ · · Score: 3, Informative
    Because 2 10K Raptors in Raid 0 isnt worth the speed increase. Last time I checked you may get a 20% increase, and reduced data integrity. I did some research into this a while ago, check out this article, very informative

    http://www.anandtech.com/printarticle.aspx?i=2101

    --
    CS: It is all sink or swim...oh and did I mention there are sharks in that water?
  17. Re:I'd love to see it... by mrsbrisby · · Score: 3, Interesting

    Also, I want this functionality on all operating systems. Unless I explicitly request deletion, no file should ever be unlinked, deleted, or whatever you call it when I delete it, whether through the command line or the GUI.

    The problem with this is, is that it causes a significant reduction in performance.

    Ideally, the operating system chose the best possible spot for that file when it got written. Once that file is deleted, that spot will once again be the fastest best possible spot- for at least something. If the operating system skips that spot for a new file, then this new file isn't going to be accessed quite as quickly.

    Truly automatic tiered storage solves this problem by splitting the directory services from the storage system- that is, the file's _name_ is no longer tied to the volume that the file happens to live on (and no, this isn't the same thing as symlinks or shortcuts). This allows the decision as to what the best spot for a file is to be deferred until later- and even spanned across multiple volumes!

    Unfortunately, such a beast is very difficult- if we make a reduction in our requirements- say that performance isn't very important- or perhaps that we can stop using our computer for a few hours each evening, then it's probably possible. What we need is a new kind of file system that supports either atomic moves between disks, or a filesystem that splits the names from the storage.

    A few research projects have been focused on these kinds of changes- but they all tend to break UNIX semantics (Amoeba immediately springs to mind)- and those UNIX semantics are, in-fact, the most widely used and recognized semantics for filesystems anywhere (Even Windows uses them!)-- people who develop a filesystem incapable of supporting them, really need to have a real good reason for breaking everyone's hard work.

    While they often do, it hasn't yet been seen as good enough for general purpose stuff.