Slashdot Mirror


Automated Tiered Storage Coming to Desktops?

roj3 writes "Tiered storage has been the scourge of administrators because the vendors tell us to hold meetings with all departments and then classify data to storage tier based on its type or relative importance. eWeek has a story about a new approach to tiered storage — sorting it all by usage patterns. Regularly used data goes on high-performance storage, idle data goes on slower/cheaper storage. Volumes and files even span several types of drives or RAID levels. Is automated tiered storage headed to desktops?"

110 comments

  1. Networks, sure. by celardore · · Score: 5, Insightful

    I can see the usefulness of this technology over a busy network with multiple users and masses of files and storage... I just can't see needing anything more than a mirror&stripe RAID array on a PC with only one user. Even that could be considered excessive.

    1. Re:Networks, sure. by dsginter · · Score: 4, Interesting

      I think we'll actually see the opposite:

      With multiple PCs per household, it makes sense to get rid of the hard drives at the PC level and put them in a RAID enclose that is secured into a wall.

      This, however, is a threat to Microsoft because you'll be able to PXE-boot any image of your choice (just think that perhaps your employer or bank supplies their own secure image in order to connect to their resources). Someone needs to get Windows to PXE boot at the hardware level (emulate IDE or something).

      This will be huge but we've got to squeeze Microsoft into it, first. Then, everyone will be free to try linux and see what we've all been jabbering about.

      --
      More
    2. Re:Networks, sure. by COMON$ · · Score: 1
      At my home I use tiered storage of a type. I have a 10K sata drive for my main OS and for all video files being worked on. (And of course my battlefield and Oblivion home folders). I also have a couple standard 7200RPM sata drives and 1 IDE drive for mass storage. I also have a shared folder for PCs that connect to my network on a separate server.

      Given I am an IT professional who can manage all this. I think that we will definately see the average home user get into tiered storage. Think about digital pictures that need backed up and games that dont. Eventually digital movies that need a decent amount of throughput and a music server and a secure drive for personal documents.

      Life is changing to the digital a bit more evey day. And just as we have cardboard boxes in our attic holding the things we dont use, file cabinets in our office alphabetized, firesafes for important documents, and Safe Deposit boxes for wills. The average home user will need to know and use the digital equivalents.

      --
      CS: It is all sink or swim...oh and did I mention there are sharks in that water?
    3. Re:Networks, sure. by 0racle · · Score: 4, Insightful

      That only makes sense if the people in the household wish to learn how to use what you've mentioned. Since current evidence points to the fact that most people look at computers as a magical box that can not be understood, the chances of them learning how to do a fraction of what you suggest is about as likely as you winning the lottery.

      The XP file sharing wizard is too much for a lot of people and you think a raid array sharing up OS images over a network via PXE makes sense?

      --
      "I use a Mac because I'm just better than you are."
    4. Re:Networks, sure. by jwjcmw · · Score: 4, Insightful

      "Life is changing to the digital a bit more evey day. And just as we have cardboard boxes in our attic holding the things we dont use, file cabinets in our office alphabetized, firesafes for important documents, and Safe Deposit boxes for wills. The average home user will need to know and use the digital equivalents."

      Or, if you are like many people, you have documents on your desk and in piles on the floor that you will never use, your kids birth certificate is in a stack of papers from when you had to take it to school for registration, your file cabinets have partially labeled folders that are in chronological order...as in the order that you stuffed them in the filing cabinet, your will is in the "to be filed" folder in the bottom of said filing cabinet and you could fill the bathtub with your old phone and electric bills.

      Hopefully the digital equivalents will be better for the organizationally challenged.

    5. Re:Networks, sure. by Shadow+Of+The+Sun · · Score: 1, Interesting

      Back in Mac OS 8 days, I use to use DiskExpress Pro. I had configured it to put the most used files at the outer cylinder (i.e. fasted part) of the drive, and the less used files on the inner cylinders.

      The software would analyze file usage, and move them around every day. The anecdotal evidence I have that it worked on such small scale was that my girlfriend later asked me how I got the computer to start responding faster.

      I don't know how well this technology would help on newer systems. I suspect at least a little. Perhaps it would really show gains for people who are video editing. Alas, Alsoft never updated their software to work on OS X.

      Essentially, these guys are doing what DiskExpress did, but on a larger scale. I have to wonder if they are stepping on any of Alsoft's patents.

      Call me a power user, but I do think that people should be at least mirroring their drives. I have heard too many people complain about losing something important because of hard drive failure.

    6. Re:Networks, sure. by Tweekster · · Score: 1

      It will be installed by the same people that currently setup file sharing for people, the neighbor kid.

      It doesnt really matter if the common person can install something, they wont be doing it anyways.

      --
      The phrase "more better" is acceptable English. suck it grammar Nazis
    7. Re:Networks, sure. by shokk · · Score: 1

      I think when referring to desktops, they mean user desktops in a corporate setting.

      In our case, we do not back up the desktops and constantly remind people that if they do not sync their data to the file server, they will deserve the pain when (not if) the disk crashes. Everyone gets a quota on their personal area and we tell them not to save crap like MP3s or AVIs to the server or the files will not last long. This saves backup tapes for the actual corporate data.

      Project data gets saved in project specific areas where they can be tiered like all other data on the server. No need to tier data for the desktop. They are free to save all their other stuff to their PC with the understanding that it will not be restored when the drive dies.

      --
      "Beware of he who would deny you access to information, for in his heart, he dreams himself your master."
    8. Re:Networks, sure. by Khabok · · Score: 0

      If it's transparent, where's the issue? I see a pull-down menu on the login screen. The actual login will be to the central box, and then an image will be sent to the host you logged in from. Process is as follows:

      Login to account: Khabok
      Password: ******
      Operating System: Ubuntu

      Want a new image on the box? It'll be on CDs marked "operating system extension." Put it in any computer on the network and it installs it to the central resource and sends an update to every host on the network. To remove, simply go to network/Operating\ Systems/ and right-click the icon. Easy.

    9. Re:Networks, sure. by Silver+Gryphon · · Score: 1

      For the average Joe, true. And at $50k, that's not for the desktop. But extended to a scaled-down version, this tech could save me time and make the entire disk subsystem more efficient.

      I'm a Windows web/database developer by day, and when I have 4 different .Net projects, 3 Visual Foxpro and a Foxpro 2.6 project open at once like I did today, even a gig of RAM gets eaten up. Windows loves to use that swapfile even if you've got a gig free, so that disk was working overtime as I switched between them. I learned to make 2 partitions per disk: fast access in a small outer partition and larger/lower demand files like MP3/AVI/MPG on a larger partition on the smaller cylinders. OS on disk0 part0, VFP data on disk0 part1, swap and SQL data disk1 part0, finally backups and rarely used files disk1 part1. As a result, my machine is faster than my coworkers who just use the vanilla setup they're given. [Side note, if you work at a corporation you can request your next PC/disk upgrade come with a partition scheme you choose. If you ask nicely and they have time, they'll often accomodate you.]

      For an enterprise, especially app/database servers, this tech may help smaller shops. My company has experienced network admins so they know how to tune a system if we tell them what it'll be doing. For the daily-accessed files this could go even further. Some of our clients have their 50GB SQL databases on a 9-disk RAID5 and wonder why it's slow. Data, logs, backups, and applications all on one volume - they don't have the experience to separate the files by access pattern. If it's technically feasible, a driver that says MDF and LDF files should be on separate spindles, BAK on another, etc. could double or triple performance on some of these systems. Of course, trained staff is better but sometimes budgets intervene.

    10. Re:Networks, sure. by cerberusss · · Score: 1
      you have documents on your desk and in piles on the floor that you will never use

      And your wife agrees with this system?

      *dumps gf*

      QUICK WHERE DID YOU GET HER??!

      --
      8 of 13 people found this answer helpful. Did you?
    11. Re:Networks, sure. by turbidostato · · Score: 1

      "but I do think that people should be at least mirroring their drives"

      But people don't seem to think the same. And its their data, after all.

      "I have heard too many people complain about losing something important because of hard drive failure."

      Did they complain to the point to ask a hardware vendor for a RAID1 off-the-shelf (of course, they wouldn't ask "give me a RAID1", but they'd answer positively to a hardware vendor advertising "no more data loss! our patented 'doubledisk' technology secures your data; only tagged 70$ over our standard prize")? My perception (and obviously, that from the computer vendors) is that users do complain for data loss, as they complain from virus, spam, piracy... but they won't put neither a nickel nor a minute where their complains are.

  2. Great Idea by Jazz-Masta · · Score: 5, Insightful

    This is exactly what everyone is looking for. People defrag their hard drives in the hopes to increase performance. There is no reason why storage that is accessed more shouldn't be on the high performance drives. Or at least some sort of class rating that defines what storage may need high performance. For example, automatically installing and saving 3D Max to a RAID 0 media, and saving word documents to the lesser-performing drives.

    I try to follow this idea all the time with my system. Fast stuff goes on RAID 0, slow stuff, and backup stuff goes on the ole' 200 GB backup drive.

    1. Re:Great Idea by mollog · · Score: 4, Informative

      Hewlett-Packard Company developed a product that did this automagically. It was an external RAID system that connected via one or two SCSI busses to a host. All incoming data was stored in RAID 0/1; striped and mirrored. (aka RAID 6 and RAID 10). As the storage filled up, unused data was automagically migrated to more space-efficient RAID 5. Data that had been accessed recently remained in RAID 0/1. You could add disk drives and it would automagically include the drives (but you would have to use LVM or other utilities in the OS to increase its file system.) You could mix two drive sizes, say, 18GB and 36GB, without trouble. If a drive failed, the array would rebuild reduncancy. If another drive failed, ditto. It was fast, it was fully redundant.

      But it was a lot smarter than the admins who had to use it so it wasn't very popular.

      --
      Best regards.
    2. Re:Great Idea by pla · · Score: 3, Insightful

      This is exactly what everyone is looking for.

      No.

      You (and a number of other posters on this topic) have described what we look for - Geeks who want to get the most out of their systems with the least expense. If I could get killer performance with a RAID0 of tiny but fast drives (think Raptors, or even Cheetahs if you don't mind dealing with SCSI), while still having the capacity of a cheap 400GB IDE drive - Of course I'd have such a setup (and in fact, many of us already do, we just manually transfer things to/from the big-n'-slow).

      Most people, however, do not want this. For starters, most people don't even need the huge drives they already have - If you gave them just the pair of RAID0 36GBs, they'd never use even half that the capacity, so no need for ever moving files to the slow storage. Then failing that, the members of the Sixpack family that manage to store hundreds of GB only fill it with downloaded porn, music, and movies - Uses that really don't need fast drives, just tons of space.


      So while it sounds useful in theory - in practice, such a setup would just add cost and complexity without providing any tangible benefit to most users. I suspect even most Geek users would rarely notice the difference (aside from OS load times), and would only make such a setup for bragging rights.

    3. Re:Great Idea by Joe+The+Dragon · · Score: 1

      windows vista it self uses 15gb+ so people will need huge drives two 36GBs in raid0 will be tight with vista.

    4. Re:Great Idea by ottffssent · · Score: 1

      Raid 0+1 and raid 1+0 are subtly different. And raid6 is completely different. There is no mirroring at all in raid6. Raid5 is a special case of an m+n parity scheme where n=1. Raid6 is a special case where n=2. It allows for the simultaneous failure of any 2 drives in the array without data loss. The raid6 algorithm is somewhat more computationally intensive than the raid5 algorithm, but this is typically only of practical importance to embedded systems and software raid arrays running applications as well.

    5. Re:Great Idea by Anonymous Coward · · Score: 0

      This sort of tech has been around for 20+ years. I think Cray (now SGI) do something where less recently used data automatically gets migrated off onto slower disks and (eventually) tape, then stored in a tape library and automagically brought back when the users access it.

  3. Already have teirs... by Kaenneth · · Score: 3, Insightful

    Registers, CPU cache, on-chip cache, RAM, local disk, Network/Removable Media, Paper/Human memory...

    It's all about feeding that data hungry CPU, as quickly as possible.

  4. Not so new... by Duncan3 · · Score: 4, Interesting

    I was using systems that did this 10 years ago. Granted, back then it was disk+tape not different speed disks, but it's the exact same thing.

    Looks to me like an excuse to charge 8-10x what you should be paying for storage of that size.

    --
    - Adam L. Beberg - The Cosm Project - http://www.mithral.com/
    1. Re:Not so new... by truthsearch · · Score: 2, Informative

      Ten years ago you had something automated that determined where the files should go and moved them appropriately? It analyzed usage patterns? I'd really like to know what older systems had such features as I've never seen them.

    2. Re:Not so new... by hotrodman · · Score: 3, Insightful


        No kidding. So they find a way to put less-used data on slower disks, that still COST NEARLY AS MUCH. The entry price is still listed as $50,000. Big fuckin' deal. Let me know when you take a bunch of garden-variety servers, and do this, with the super cheap clone raid server with 40 terabytes of SATA as the 'last tier' for slowest files, where I can build 100 terabytes for $50,000.

          And yet, managers will get a woody over this buzzword compliance and want to give these guys millions to have the 'latest and greatest'.

          And have it still work with tape, too, and not tied up in some cumbersome, proprietary protocol owned by one little company that could go out business.

    3. Re:Not so new... by drinkypoo · · Score: 2, Interesting

      I know bugger all about them so I can't vouch for the accuracy of this information but someone who worked in the basement of the Santa Cruz County Courthouse, where the county's servers are - some of those big goofy IBM mainframes that require their own AC system - have been ticking away since time immemorial... and according to one of the sysops they have tiered storage which automatically will put stuff on magtape, and then ask them for the tape again later when the records are accessed. I guess a lot of what they did down there was serve automated tape requests. I hope they're no longer using such a system (disk is cheap) but who knows, it's government.

      --
      "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
    4. Re:Not so new... by Anonymous Coward · · Score: 0

      IBM's ADSM (now known as Tivoli TSM) had automated usage-based hierarchical storage capabilities at least as early as 1996.

    5. Re:Not so new... by dpilot · · Score: 4, Informative

      It was called HSM, (Hierarchical Storage Management) it ran on IBM's MVS on mainframes, and it moved your less-used data to cheaper storage, in several stages. IIRC, the first stage was just compression on a different disk, the second stage was a tapes in a jukebox-type thing, and the third stage was tapes that an operator fetched and loaded. Somewhere way back there, data never used for 5 years fell off the end of the belt, but you got warned, first.

      The day after vacation, when you kept getting the message, "DFHSM is recalling dataset xyz for user jkl" as it pulled all of your storage back online was a pain, and we all thought it would be neat to get rid of, as we migrated to workstations. But in retrospect, HSM was great, never having to worry about your data quantity. That's compared with having to root through $HOME every few months to take care of quota problems.

      --
      The living have better things to do than to continue hating the dead.
    6. Re:Not so new... by Doctor+Memory · · Score: 3, Informative
      you had something automated that determined where the files should go and moved them appropriately? It analyzed usage patterns?


      Oh yeah. BITD, there was the archiver, a job that ran every night and moved files that hadn't been accessed in the last N time periods to tape. It left the VTOC entry (kind of like an inode), just marked it "archived" and the label of the tape. Then, the next time that file was accessed, a hook in the open() call would send a message to the console operator telling them to mount tape such-and-such. When the tape was mounted, the archiver would automatically copy the file back into place, the open() call would complete normally, and life was good. Basically transparent to the user (they'd look at their directory and all their files would be there), except for the fact that the file open would take two-three minutes. Then again, since they were paying for disk storage by the block-day, they were generally pretty happy to only pay for a fifty-cent tape mount every quarter instead of keeping that 1200-block file on-line for three months when they weren't using it.
      --
      Just junk food for thought...
    7. Re:Not so new... by Anonymous Coward · · Score: 1, Informative

      I'd really like to know what older systems had such features as I've never seen them.

      Novell.

    8. Re:Not so new... by SquadBoy · · Score: 1

      That's not the same thing. At all. Read the article again.

      --

      Cypherpunks: Civil Liberty Through Complex Mathematics. Those who live by the sword die by the arrow.
    9. Re:Not so new... by drinkypoo · · Score: 2, Informative
      That's not the same thing. At all. Read the article again.

      You know, I did read it, and what they're talking about is that data that is less used/less critical gets moved to slower/less reliable storage automatically.

      And when you have only two kinds of storage, a DASD bank and mag tape, and your system automatically writes least used data to tape and tells you to file it, and asks you for tapes when it needs them - well, I'd say the two are highly analogous. The fact that the slower storage is offline is nothing more than a detail. It's still storage, it's still available - it just has a wicked long seek time.

      I realize that offline storage is different - but only in some details. The majority is the same concept.

      So it's not the same thing, but the "at all" is bullshit. They're quite similar.

      --
      "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
    10. Re:Not so new... by tzanger · · Score: 1

      Any word on something like that for Linux fileservers? I am envisioning (as a first pass thought) a cron job with find -ctime that replaced the file with a symlink to the online compressed storage, but you may need some kind of hook into Samba or a lower level hook into the FS itself that grabbed it out of "cold storage" so to speak.

    11. Re:Not so new... by Anonymous Coward · · Score: 2, Informative

      Guys, guys, guys, you talk about HSM like it is old and gone. On the contrary, it is the future! We use it everyday where I work, using a program called SAMFS. We have a tape robot and a large disk cache. Data that is used often stays on the cache. Less used data goes to tapes. SAMFS sorts out when/how to do that. The system is great not only because the software figures all of this out for you, but also because it works as your backup software. We switched to this system about 4 years ago when we realized that doing a nightly backup of our TB's of storage was taking hours and was no longer a nightly backup but a morning backup as well. Plus, adding additional disk drives for storage was going to be a very costly thing.

      While it is true that getting less used data OFF of the tape takes some time, it is not that bad if it is data you don't use very often. Depending on the file, it seems to take about 8 minutes or so for us. I think that's because the tape has to find the file, but once it has, it can just copy it off.

      I highly recommend SAMFS. Unfortunately, it only runs on Solaris. :( We examined several Linux solutions at the time, but they were too unstable. Perhaps now there are better ones out there?

      Check it out. Even if you don't use it, it's cool stuff.

      John

    12. Re:Not so new... by Anonymous Coward · · Score: 0

      SGI do a Linux solution i think. Let me check... Yeah - http://www.sgi.com/products/storage/tech/dmf.html It's pretty cool tech. Runs totally transparent.

    13. Re:Not so new... by bill_mcgonigle · · Score: 1

      While it is true that getting less used data OFF of the tape takes some time, it is not that bad if it is data you don't use very often. Depending on the file, it seems to take about 8 minutes or so for us. I think that's because the tape has to find the file, but once it has, it can just copy it off.

      Something to check on - when I was actually looking at these kinds of systems 10 years ago, they helped that issue by using Magneto-Optical disks in between disk and tape. It may be that your installation is less expensive because it lacks it or maybe they discontinued that technology.

      Hey, y'know, today we could have a SATA robot instead of MO cartridges, given that the connectors are now standardized. Anybody seen one?

      --
      My God, it's Full of Source!
      OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
    14. Re:Not so new... by Cicero382 · · Score: 1

      "I was using systems that did this 10 years ago. Granted, back then it was disk+tape not different speed disks, but it's the exact same thing."

      Got you beat there. I was using a similar system nearly 30 years ago at university - again disk and tapes. The O/S was GEORGE III running on a ICL xxxx (can't remember). Very usful in the days when your disk quota was measured in kilobytes but totally automatic migration to tape tended to keep your disk usage down.

      One problem, though. After the summer break, all your data had been moved to tape. So, you access the data and it starts a restore... trouble is *everyone* else is doing the same thing. Two day restore time, anyone?

  5. Certainly could be done in a desktop by Don853 · · Score: 2, Insightful

    Put two 10k Raptors in Raid 0 for your games and other stuff you need REALLY FAST, and then have a big 250GB 7200RPM drive for everything else. People are doing that already.

    All you would need is some software for automatically moving it around. Though most people with desktop rigs like that probably would rather control what is on which drives themselves.

    1. Re:Certainly could be done in a desktop by mcpkaaos · · Score: 1

      Put two 10k Raptors in Raid 0 for your games and other stuff you need REALLY FAST, and then have a big 250GB 7200RPM drive for everything else. People are doing that already.

      You just described my desktop exactly. :D

      --
      It goes from God, to Jerry, to me.
    2. Re:Certainly could be done in a desktop by mcpkaaos · · Score: 1


      --
      It goes from God, to Jerry, to me.
    3. Re:Certainly could be done in a desktop by COMON$ · · Score: 3, Informative
      Because 2 10K Raptors in Raid 0 isnt worth the speed increase. Last time I checked you may get a 20% increase, and reduced data integrity. I did some research into this a while ago, check out this article, very informative

      http://www.anandtech.com/printarticle.aspx?i=2101

      --
      CS: It is all sink or swim...oh and did I mention there are sharks in that water?
    4. Re:Certainly could be done in a desktop by kwark · · Score: 1

      Why? For the price of a 36Gb Raptor you can get 300Gb el cheapo drives. I put 4 of those in a RAID5 on my SATA1 controller and get write speeds of 130MBytes/s (reads at 180MBytes/s according to dd). More diskspace with a higher reliability compared to RAID0 without the need to move suff around.

      Sure the drives are more likely to fail, but then again so is that single 250Gb "for everything else" drive.

    5. Re:Certainly could be done in a desktop by UltimateRobotLover · · Score: 1

      I read through the document you linked to, but couldn't see any detail on whether they aligned the sectors to the disk boundary. Specifically, I understand that Windows XP uses a 63 sector MBR, whereas a 64 sector offset will align I/O to disk boundaries. The disadvantage of Windows's standard configuration is that certain small I/Os will overlap two disks, forcing two hardware I/O operations for one software I/O request. Microsoft does a handy tool called Diskpar.exe (it's included with the Resource Kit). Here is a much more comprehensive description of the problem than I could write in a Slashdot comment.

    6. Re:Certainly could be done in a desktop by Don853 · · Score: 1

      Because 2 10K Raptors in Raid 0 isnt worth the speed increase. Last time I checked you may get a 20% increase, and reduced data integrity. I did some research into this a while ago, check out this article, very informative

      --------

      Why? For the price of a 36Gb Raptor you can get 300Gb el cheapo drives. I put 4 of those in a RAID5 on my SATA1 controller and get write speeds of 130MBytes/s (reads at 180MBytes/s according to dd). More diskspace with a higher reliability compared to RAID0 without the need to move suff around.
      Sure the drives are more likely to fail, but then again so is that single 250Gb "for everything else" drive.


      I never said it was worth it or not worth it. I just said the basic idea of what's in the article could already be done in a desktop, and indeed people do do it. Besides, gamers don't tend to care as much about data integrity, because with the exception of save files if they're playing non-online games, they don't have anything to lose that can't be reinstalled in an afternoon. I think it's obvious that exactly what was stated in the article, a combination of RAID 10 and RAID 5, will never appear in any mainstream desktops. Why or why not is up to the user.

    7. Re:Certainly could be done in a desktop by kwark · · Score: 1

      But this idea is just plain silly for a desktop: added complexity with a diminishing speed gain the more disks you add.

      What the article says is not important since it's about expensive hardware where as a desktop raid is cheap disks + some software glue.

      To flood a SATA150 bus you only need 2 high performance disks. So you suggestion would most likely be 2 raptors raid0 and 2 lowends raid1/0 (on a 4 port controller). When the cheap storage is idle one will get max read/write, less when active (hard to guess how much). Also the system needs some LVM layer which migrates data between physical disks when idle. On a busy system that may mean write speeds degrade when no space is left on the fast section during long continious writes.

      With 4 lowend disks one always gets about the max SATA150 reads and 3/4 that in write speeds (resp. the approx 180 and 130 MB/s I mentioned), the software for this is already being included in popular OSes for years.

      So a RAID5 is not only cheaper per MB storage, it's more reliable. Downside is writes take more CPU time and one loses 1/#drives max speed. On the upside again the technology is already here for many years vs. vapourware.

    8. Re:Certainly could be done in a desktop by COMON$ · · Score: 1

      Sorry I didnt reply earlier. I am not aware of how they set up the RAID in the article. But I am not a big fan of striping, too much data loss unless you can afford the mirror. In the case of the mirror then you had better be preparing for some serious disk IO from a SQL cluster or heavily used Exchange box because you are putting a pretty penny into storage...Thanks for the article though, good review for me.

      --
      CS: It is all sink or swim...oh and did I mention there are sharks in that water?
  6. Oh....good.. by JerBear0 · · Score: 5, Insightful

    "idle data goes on slower/cheaper storage"

    So that special little something that you need once a year, but when you need it, you need it RIGHT NOW is tied to the foot of a pigeon fluttering around the warehouse somewhere. Frequency of use does NOT denote importance.

    --
    Bad experience is a school that only fools keep going to.
    1. Re:Oh....good.. by kyc · · Score: 1

      Frequency of use DOES denotes importance, at the very least STATISTICALLY. Just because you want "that special little something" once a year; does not mean you can degrade the speed of information which is instantly needed. This is an obvious fact

      --
      There's plenty of room at the bottom! Richard P. Feynmann
    2. Re:Oh....good.. by Mage+Powers · · Score: 1

      if the slower storage is still online and accessable at say 1998 hd speeds that'd still be good enough, without reading the article it seems like a good idea

      Hell, if its a text document you need only once a year then 1970 hd speeds might be good enough (for reading the doc, not MS Word)

    3. Re:Oh....good.. by Red+Flayer · · Score: 3, Informative

      That's what metatagging is for. Tag files that are not to be moved to slow storage no matter how infrequently they are accessed. RTFA.

      --
      "Trolls they were, but filled with the evil will of their master: a fell race..." -- J.R.R. Tolkien on Olog-hai
    4. Re:Oh....good.. by digitaldc · · Score: 1

      So that special little something that you need once a year, but when you need it, you need it RIGHT NOW is tied to the foot of a pigeon fluttering around the warehouse somewhere.

      But look at it this way, at least the pigeon won't put you on eternal hold when you need rapid tech support in your time of crisis.

      --
      He who knows best knows how little he knows. - Thomas Jefferson
    5. Re:Oh....good.. by lucabrasi999 · · Score: 1
      So that special little something that you need once a year, but when you need it, you need it RIGHT NOW is tied to the foot of a pigeon fluttering around the warehouse somewhere. Frequency of use does NOT denote importance.

      It sounds like you don't pay the IT department for your storage. In my experience, once a department is charged for storage, they suddenly start requesting cheaper storage.

    6. Re:Oh....good.. by Kadin2048 · · Score: 3, Insightful

      Frequency of use doesn't denote importance, but it might denote how quickly you need to be able to recall it. Similarly, importance doesn't imply that quick recall is necessary. If you don't use something frequently, it might be okay to store it somewhere that takes a while to recall from, even if it is "important," as long as you know where it is so that you can get it back.

      As an example, financial records for past years might be very important, but you don't need to be able to access them in a tenth of a second. As long as you can get to them if you really want to (sacrificing a few seconds), then it's all right.

      The way I see this translating to reality is that you'd keep all your old documents in slow-speed storage, but then keep an index in high-speed storage, so that you could easily search (both by name and by content) and decide when to pull stuff out of your archives.

      This is no different than what people have been doing for centuries with paper. Just because the card catalog is located in the center of the library doesn't mean its contents are inherently more valuable than the actual books (which might be in the basement, back shelves, wherever); it just means that the catalog gets accessed much more often.

      Actually, in the physical world, people often exchange speed of recall for certainty of recall. You put important documents in a safe-deposit box, rather than your kitchen counter, because even though it'll take you longer to get them out of the box, they're guaranteed to be there when you need them. Likewise, a system which traded off speed for redundancy would probably be appropriate for "important" but infrequently-accessed electronic documents.

      --
      "Ladies and gentlemen, my killbot features Lotus Notes and a machine gun. It is the finest available."
    7. Re:Oh....good.. by Red+Flayer · · Score: 1

      Good call. Departmental chargebacks, along with bonuses partially tied to budget variance, lead to more cost-effective methods. I've noticed more companies charging IT costs to individual departments, instead of lumping it all under admin costs, and not just for companies in the IT sphere of business.

      When expensive storage == no new Blackberries his year, sales departments take notice :)

      --
      "Trolls they were, but filled with the evil will of their master: a fell race..." -- J.R.R. Tolkien on Olog-hai
    8. Re:Oh....good.. by lucabrasi999 · · Score: 1

      I am at a contract where we are moving to a chargeback system. I can't wait until it is implemented. It will be so much fun to watch the change in attitude.

    9. Re:Oh....good.. by Red+Flayer · · Score: 1

      Hopefully your contract will be over when the first review period ends...

      VP of Marketing: "What do you mean the chargeback for that tech is more than I make per hour?"

      CIO/CFO: "Their time is valuable to us than yours."

      VP of Marketing: "What am I, a schmuck?!"

      CIO/CFO: "Yes."

      In my experience, that's the downside of chargebacks -- all of a sudden, everyone has an idea of what "that guy in the server room" makes... and is VERY unhappy about it.

      --
      "Trolls they were, but filled with the evil will of their master: a fell race..." -- J.R.R. Tolkien on Olog-hai
    10. Re:Oh....good.. by pawn63295 · · Score: 0

      Yeah i mean look at microsoft source code i bet that gets used all the time... doesnt mean its important

    11. Re:Oh....good.. by mpcooke3 · · Score: 1

      Frequency of use does NOT denote importance.

      It doesn't *always* denote importance. however, if a tiered storage system improves performnce a large enough percentage of the time then I'd live with a drop in performance on the odd occasion. Similarly to using spare memory for IO/file caching.

    12. Re:Oh....good.. by Medievalist · · Score: 1
      That's what metatagging is for. Tag files that are not to be moved to slow storage no matter how infrequently they are accessed. RTFA.
      And so much for the automated part of automatic hierarchical storage management.
  7. New form of something old by davidwr · · Score: 1

    Tiered storage has been around for ages. In the old days it was disk with tape as a backing store.

    I do like the idea of this product. Similar performance gains can be had by having the OS manage the data. It's a different-yet-similar concent but some desktop OSes do this already with code libraries, putting them all in a single directory with little or no fragmentation within the file to allow for faster loading. Other OSes play similar tricks with system library metadata.

    --
    This would have been FIRST POST but I decided to actually write something.

    --
    Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
  8. Low power Optimization by kyc · · Score: 1

    This scheme reminded me of low power optimization in circuit level. The critical pahts in the circuit are governed by low threshold transistors ( ensuring high performance, i.e speed ) The non-critical paths are governed by high threshold transistors ( ensurin low leakage in stand-by mode with no particular degradation of speed since they sit on non-critical paths, that is the idle paths. It is nice to see the core of this idea in a macro-scale.

    --
    There's plenty of room at the bottom! Richard P. Feynmann
  9. Doesn't OS X do this? by Anonymous Coward · · Score: 0

    Is automated tiered storage headed to desktops?"

    I thought OS X did something like this-- ie, moving most-often accessed files into optimal places on the hard drive.

  10. It sounds good in theory... by PixelPirate · · Score: 1

    For a large-scale organization on the order of hundreds of employees, but I doubt very much that it would be viable on the desktop (watch, as I say it, HP and Dell as rubbign their hands...). This is for a number of reasons mostly rotating around performance.

    For example, take an MP3 collection. I go to open up my old Soviet music collection (which I have), but I haven't listened to it in months, possibly even years. This would put it on the low-end of the priority and I would have to wait for the data to be retrieved, all the while watching paint dry. Similarly, if I have a game that I haven't played in a long while, but that was installed on my computer, you would see HUGE performance delays as each file has to be retrieved.

    There is also the question of quality. In large-scale organizations, where you might have your volitile backups on this medium, in the case you need to restore from it, you really do need something of a high-quality, not something that is "cheap". Likewise, when the PHB is opening a finance sheet only to see it has been corrupted due to the "cheap" media failing, there will be hell to pay. I will say, however, that this technology does have some very interesting applications outside of your general company server.

    Anyway, I for one welcome our two-tiered storage overlords...

    In Soviet Russia, two-tiered storage retrieves you!

    I'm a third-tier storage you insensitive clod!

    Cowboy Neil!



    -PixelPirate

    1. Re:It sounds good in theory... by cyngus · · Score: 1

      Yes, but you're missing something here. Would you rather have to wait a relatively long peirod of time for infrequently used files or have to a relatively short period of time for every time you use a file that you use frequently? The theory is that the (relatively) long pause to get little-used files is shorter than the aggregate delays of loading frequently-used files. Also, I think the amount of time you'd have to wait for even your least frequently used files would be relatively low. In the worst case they would be stored somewhere across the Internet and retrieved over a broadband link, which is getting broader all the time. In the case of streaming-type (movies and music) your only real delay is the latency, since you can start using them even while they continue to download.

    2. Re:It sounds good in theory... by Belial6 · · Score: 1

      Add to that, the fact that the word "cheap" is being used in different ways here. An extreamly reliable 10k rpm drive is going to be noticably more expensive than an extreamly reliable 5400 rpm drive. When people are saying "cheap" they mean a lot less expensive, not poorly made.

      I would like to know what kind of paint your using that dry's in the time it takes to load an mp3 off of a slow 5400 rpm drive. ;)

  11. No but it is correlated by davidwr · · Score: 3, Insightful

    Apply "frequency of use = urgency" to BIGNUM pieces of data and you will have a very useful albeit sub-optimal algorithm.

    Yes, there are exceptional cases, like the President's access to the Nuclear Briefcase. It hasn't been used for real in a long time if ever but when he needs it it had better be close at hand. However, these special cases can be treated as the special cases they are.

    --
    Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
    1. Re:No but it is correlated by cperciva · · Score: 4, Insightful

      Yes, there are exceptional cases, like the President's access to the Nuclear Briefcase. It hasn't been used for real in a long time if ever but when he needs it it had better be close at hand.

      Oddly enough, I think most people in the world would prefer that it wasn't close at hand when Bush decides he wants it.

      A better example is fire extinguishers -- most of them will literally never be used, but there's a very good reason to ensure that they are readily available.

    2. Re:No but it is correlated by CosmeticLobotamy · · Score: 3, Funny

      A better example is fire extinguishers -- most of them will literally never be used

      If that's the case for you, then I feel sorry for you. You've apparently never known the snowy, probably-toxic joy of Fire Extinguisher Expiration Day. It's the happiest day of the decade.

    3. Re:No but it is correlated by Anonymous Coward · · Score: 0

      I too love this day. However, I prefer the CO2 units to the jack of all trades, master of none ABC jobs. Non-toxic, non-corrosive and frosty. tons of fun. For actual fires I like a mix of class BC CO2 and class A water extiguishers. You have to know what you are doing, but they do the best job wit the minimum collateral damage.

  12. IDE Neutrality? by sseaman · · Score: 5, Funny

    From its beginnings, the Hard Drive has leveled the playing field for all files. Everday files can have their content read by thousands, even millions of processes.

    The Coalition of Unused Files believes that the desktop is a crucial engine for personal and economic growth. They are working together to urge System Admins to preserve IDE Neutrality, the First Amendment for the Desktop Hard Drive that ensures that the Desktop remains open to innovation and progress.

  13. This is "new"? by Medievalist · · Score: 3, Insightful


    IBM mainframes that literally pumped water were doing this decades ago.

    What, you say water cooling is coming back too?

  14. It already is by malraid · · Score: 3, Insightful

    That's why you have HDD with cache. That's the whole concept of "virtual memory". The next step might be hybrid hdds (solid state / mag platters). But I don't think it will go much farther than that. Multiple raids is overkill for the average desktop.

    --
    please excuse my apathy
  15. Just read TFA: by Ant+P. · · Score: 4, Insightful

    $50k for a 6TB fileserver? What's that extra $40000 paying for that a normal fileserver loaded with RAM can't do just as fast?

    1. Re:Just read TFA: by Anonymous Coward · · Score: 3, Interesting

      Apples and pomegranates you compare;
      Channels of Fiber come not cheap.
      Terabytes 6 with connection of light for less than $50k you will not find.
      Terabytes 6 with connections of wire you may.
      SATA drives, untested are delivered.
      SATA drives with fewer bearings.
      SATA drives with short life.
      Enterprise storage is not easy.

    2. Re:Just read TFA: by roj3 · · Score: 1

      RTFM

    3. Re:Just read TFA: by ch-chuck · · Score: 1

      What's that extra $40000 paying for

      A man in a suit with a laptop and a Powerpoint presentation to demonstrate how it'll lower your TCO, increase your ROI, and boost your career.

      --
      try { do() || do_not(); } catch (JediException err) { yoda(err); }
    4. Re:Just read TFA: by Anonymous Coward · · Score: 2, Interesting

      It's not a server, it's a SAN. You connect a server via HBA to the SAN unit. The cost differece is in the performance of the drives you're getting (8 15k 146gb fiber channels and 8 500gb 10k fiber channels), these aren't the same Maxtor 250 GB SATA drives you picked up at Best Buy last week. (then there's the enclosures, controllers, io cards, etc....)

  16. Just like my kitchen by Red+Flayer · · Score: 5, Funny

    Cheetos go in the easy-to-reach cabinet next to the fridge.

    Beer goes in the fornt on the top shelf of the fridge, milk (eventually cheese, typically) goes on the bottom shelf in the back.

    This is automated, since I simply shove things onto the shelves when I get home from the supermarket. Anything I consume and replace ends up at the front. Anything I buy because I 'should' be eating it (like fiber biscuits, or whatever) ends up pushed to the back.

    It's automated via metatag, too. Anything tagged 'ice cream' goes in the door of the freezer, anything tagged 'vegetable' gets relegated somewhere in the back, where it quickly develops an inch of ice crystals, to slowly dry out to a freezer-burnt state of suspended animation until I buy a new fridge unit.

    This costs no more than regular kitchen storage space, but if you'd like a custom design for you and your loved ones, my consulting fee is $75/hr, or a bag of chips and a six-pack.

    --
    "Trolls they were, but filled with the evil will of their master: a fell race..." -- J.R.R. Tolkien on Olog-hai
    1. Re:Just like my kitchen by mkw87 · · Score: 1
      my consulting fee is $75/hr, or a bag of chips and a six-pack

      Wow, that must be some sixpack!

      --
      Arguing with an engineer is like wrestling a pig in mud. Soon, you realize the pig is dirty, and he likes it.
  17. Yes, Kinda... by ThinkFr33ly · · Score: 4, Informative

    Automatic tiered storage is definitely coming, but probably not in the form of multiple disks that run at different speeds or RAID levels.

    Microsoft announced a while back that Windows Vista would support three technologies designed to improve disk speed called SuperFetch, ReadyBoost, and ReadyDrive. SuperFetch is simply a way of preloading applications and data when the OS anticipates that you'll be loading those soon.

    ReadyBoost and ReadyDrive both utilize persistent memory caches to speed up access to the disk.

    ReadyBoost treats normal USB keys and flash disks like temporary caching locations for data from the disk.

    ReadyDrive is essentially the term Microsoft uses to described their support for hybrid hard drives, which are disks that have a built in flash memory module that's used as a persistent cache.

    Not only do hybrid disks dramatically increase performance, but they also result in huge power savings for mobile devices like laptops and media players.

    1. Re:Yes, Kinda... by Anonymous Coward · · Score: 0

      One issue with Hybrid Hard Drives is the writability of Flash. Flash memory wears out significantly more quickly when compared to a magnetic head. There are many algorithms out there to move the number of reads and writes around on flash memory. I wouldn't want to buy a 300GB HD with 10GB of flash cache at ten times todays rate to find it's lifespan to be significantly shorter than a standard mag-head drive. But that's just me.

  18. I could see a use. by Kadin2048 · · Score: 2, Interesting

    I could see a use for something like this. Personally, I've stopped throwing stuff away. With the exception of temporary and cache files, storage is cheap enough that I just don't delete anything on the off chance that I might want it again. Every email, every instant message, every dictated note (I use a little Olympus digital recorder), every digital photo, it's all saved. By the time I fill up my main hard drive with stuff, I can just buy another one that's probably between two and five times the size, dump everything onto it, and keep the old one as a historical backup. (I keep online backups as well, but I won't bore you with it here.)

    I don't think I'm that atypical in this regard. GMail brought the idea of saving all your email, forever, to the masses; Flickr gives you an unlimited amount of photo storage; and technologies like Apple's Spotlight make it relatively easy to search through gigabytes of saved information and pull up related items. What we haven't seen yet is a lot of popular interest in redundant backup systems: that'll come later, once people start realizing how much of their lives they're stored away on the crummy OEM drive in their Dell. (Probably after a lot of them fail and we hear some real horror stories.)

    It's not hard to imagine a near future where people just get used to not throwing anything away. In that situation, tiering storage -- allocating the fastest media to the most frequently accessed information -- could have big performance gains. And assuming that you have a relatively static amount of frequently-accessed information, and basically only add information to the "infrequenly accessed" category, a tiered system means that you only really have to add storage to the bottom tier. It's a pyramid where the base gets larger and larger, but the upper part remains basically the same size.

    So for example, as you save more and more emails (infrequently accessed information), they automatically get saved onto inexpensive, slower drives, which are then mirrored to each other for redundancy. A single, fast drive could hold the system -- maybe solid state storage? -- and more frequently-accessed data. A smart system would know what information needs to be moved up to faster storage to be very useful (uncompressed digital video, for example, wouldn't be much fun to work with off of a slow drive), and what can be left there as it's accessed (MP3s and compressed video could be played directly from slower media).

    I think it's an interesting technology with a lot of possible applications, but as with a lot of other things, it'll be the home user who arrives last to the party, because their storage is the least centralized. Unless there's a move away from storage on individual desktop PCs and towards storage on per-home servers, it'll be a while before most people require or see the benefit in such a thing.

    --
    "Ladies and gentlemen, my killbot features Lotus Notes and a machine gun. It is the finest available."
  19. Internet detectives will love this... by janet-on · · Score: 0

    I've worked for several years both creating programs inside the database and on a server layer outside it (and also just about every other layer).

    I have to agree with grassbeetle above.

    Software architecture-wise:
    - You can't make a scalable architecture if you put everything in one single place (in this case the database).
    - You will be hard-pressed to create a failure tolerant architecture if you stuff everything in a single point of failure.
    - Databases are NOT application servers. They are designed with data storage and retrieval in mind, not reliable execution of complex business logic. Amongst other things databases do not make available in an easy and/or reliable way some of the standard application server functionality.
    - All external components of the application (for example UIs) have to connect to the database. You're now stuck to using the connection protocols from the chosen database. This might cause all sort of problems with security, firewalls, use of asychronous messaging, availability of adaptors in the platform you are deploying your applications to, etc...
    - Spliting your application accross several servers or in a multi-tiered geographical distribution is much harder.
    - All coders have to have a good knowledge on how to work with the specific database you are using.
    - Programing inside databases is not standartized. Different databases and indeed different versions of the same database have sometimes different versions of the same language or different libraries available. The language/libraries have not been so throughly used/tested/examined by a big user comunity (while for example standard C/Java/etc libraries have been thouroughly debugged in billions of man-hours of use). This means more library bugs and a lack of third party tools for software design and development inside the database.
    - Facilities such as version control, source control, etc are either not available or difficult to use in a reliable manner.
    - Availability of compatible 3rd party libraries or application modules is very, very restricted by comparison to NOT having your server side logic all inside the database.
    - Forget about moving databases in the future. Also, simple migrating to a newer version of the database can be a nightmare.

    Software design-wise, the design of the software will be strongly constrained by the internal structure of the database:
    - Information flows will mostly have to be database-like information flows
    - A true object oriented structure is pretty much impossible. At the most you can do weakly connected islands with an objecte oriented structure. If the database language you have to use is procedural forget about OO design.
    - Server-side initiated connections to outside entities, thread control, ditributed transactions and other more advanced functionalities are pretty much impossible.
    - Usage/integration with 3rd party libraries or application modules is very hard or even impossible.

    Software programming-wise, and from my experience (mostly Oracle):
    - The language sucks.
    - The application libraries (not the DBA ones) suck big time.

    Simply put, a software architect that puts all server-side logic inside the database is with this single choice removing almost all his other architecture options and creating/fortifying vendor lock-in of the application to the database itself and 3rd party tools and also of the development team itself by means of the knowledge experience they have/will gain with said database and said 3rd party tools.

    Such a person should IMHO either be demoted to a place were he/she can't cause any damage or fired outright.

  20. How is this new? by Eric+S.+Smith · · Score: 1

    This is hardly a new concept — mainframes have been migrating untouched datasets to tape for years. If this really is a new idea in the SAN market, SANs must suck worse than I'd previously supposed.

    And “Is automated tiered storage headed to desktops?” Well, no, unless there's something cheaper than hard disks, which there currently really isn't.

  21. Sounds reasonable by Orange+Crush · · Score: 1

    One application of something similar is definately coming to desktops (and laptops in particular) in hybrid hard drive arrangements--cacheing commonly used files to flash memory to be able to spin down the platters and conserver power or for performance gains. (Although I remain wary of Vista using USB thumb drives as caches . . . finite read/write cycles and all.)

  22. Coming full circle -- good idea! by ThinkingInBinary · · Score: 1

    This is interesting, because when you read about old operating systems that ran on computers with several types of memory--fast magnetic core memory for the active programs, slower rotating-drum memory for less active data, large and slow hard drives, and automatic tape drives--they did exactly this. It makes sense that, given that we have L1 cache, L2 cache, and system RAM, each of which is slower and larger than the next, that we would extend this to hard drives, having a small, fast drive for often-used data and larger, slower hard drives for archived data. This would be the sort of thing I would expect Hans Reiser to want to add to reiser4 (or maybe reiser5)--the ability to span a filesystem across multiple block devices to optimize performance.

    1. Re:Coming full circle -- good idea! by ratboy666 · · Score: 3, Informative

      And my favorite commands on the ol' HP-2000 mini:

      SANCTIFY and DESECRATE

      "Sanctify file" moved the file to drum (basically, one-drive RAID 0 for all you young-uns). Desecrate moved it to the regular hard disk.

      YMMV

      Ratboy

      --
      Just another "Cubible(sic) Joe" 2 17 3061
  23. Hierarchical Email storage, client driven by HKcastaway · · Score: 1

    I clearly see a benefit of using the client machine (PC) as part of the storage hierarchy, the data being moved belongs to a specific user. You can apply usage patterns, policies based on server storage available etc. Email could be moved from the client to the server transparently over IMAP even without modifying the protocol. For most cases this makes it irrelevant whether you are given 100MB or 2.7GB of email storage by your email (online spyware) provider. Here are my 2 cents. http://blogs.hk.com/index.php?/archives/56-The-nee d-for-Hierarchical-Email-storage..html

  24. Liars, damn liars, and statistics by Medievalist · · Score: 4, Informative
    Decades ago, we used to laugh at the mainframers and their automated hierarchical storage systems because they'd make exactly these kinds of statements.

    Frequency of use DOES denotes importance, at the very least STATISTICALLY.
    No. Absent other data, it only denotes frequency of use, period. Playboy.com gets more hits than the general ledger webapp if you unblock your company firewall, but the general ledger is more important to the company.

    Just because you want "that special little something" once a year; does not mean you can degrade the speed of information which is instantly needed.
    There is actually very little correlation between what the average user wants and what s/he needs, as is empirically obvious. If the image from the "fly-fishing.com" website that they've set to come up as their background image every morning fails to load, they can still work, but if the once-a-year corporate audit checklist gets put on slow, old storage and then gets lost in a hardware failure, the company stock price may flutter and certainly heads will roll in the corporate IS department.

    This is an obvious fact
    I don't think that word means what you think it means.
    1. Re:Liars, damn liars, and statistics by Krazy+Nemesis · · Score: 1
      I don't think that word means what you think it means.
      Inconcievable!
    2. Re:Liars, damn liars, and statistics by Anonymous Coward · · Score: 0
      but if the once-a-year corporate audit checklist gets put on slow, old storage and then gets lost in a hardware failure, the company stock price may flutter and certainly heads will roll in the corporate IS department.


      I think you're getting a bit confused. You cannot tie hardware failures into this analysis. Your argument hinges on the two flawed assumption that either (1) because data is placed on a slower drive it will be more likely to be lost/destroyed[?!] than data on a faster drive, or (2) that it is, for some reason, harder to locate and/or takes an astronomically long time to retrieve said file on a slower RPM drive than it does on a faster RPM drive. This is just not the case.

      First of all, BOTH your high-speed drives and your low-speed drives should be backed up. BOTH of them equally. You will not have any more likelihood of losing data on a slow-drive than a fast drive. Just because one drive contains less-used files does not mean that it doesn't get backed up. Therefore you CAN place important files on the slower drive.

      Second of all, we are talking hard disks here, not tape. Tape is used for backups, not working drives. It doesn't take 30 minutes to scroll through a 7200RPM hard disk and locate a desired file like it does with tape.

      Finally, due to the points mentioned above, that even though it's SLOWER, it's still ACCESSIBLE, the relative "importance" of the data doesn't matter any more. The fact some that data needs to be accessed more than others DOES matter. In fact, it's the only thing that matters -- more frequently needed data needs to be able to be retrieved faster. Period.
  25. I'd love to see it... by drinkypoo · · Score: 2, Insightful

    ...but we can't seem to even get a fucking trashcan right.

    I should never have to empty my recycle bin manually, except where I want to perform a security erase - which should be a function delivered with my operating system. This is the height of stupidity.

    It's not even a hard problem! There's functions which programs use to check for free space. Lie to them. Don't count files in the recycle bin against the available free space. If you're about to run out of space, delete the least recently used file. Perhaps you might also base things based on total number of accesses, or other criteria, but I believe (perhaps naively) that making the trash can an automatic FIFO from which files are automatically deleted when disk space is low would be about a hundred times better than what we have now.

    Also, I want this functionality on all operating systems. Unless I explicitly request deletion, no file should ever be unlinked, deleted, or whatever you call it when I delete it, whether through the command line or the GUI.

    This is not hard and it would make everyone a lot happier.

    --
    "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
    1. Re:I'd love to see it... by mrsbrisby · · Score: 3, Interesting

      Also, I want this functionality on all operating systems. Unless I explicitly request deletion, no file should ever be unlinked, deleted, or whatever you call it when I delete it, whether through the command line or the GUI.

      The problem with this is, is that it causes a significant reduction in performance.

      Ideally, the operating system chose the best possible spot for that file when it got written. Once that file is deleted, that spot will once again be the fastest best possible spot- for at least something. If the operating system skips that spot for a new file, then this new file isn't going to be accessed quite as quickly.

      Truly automatic tiered storage solves this problem by splitting the directory services from the storage system- that is, the file's _name_ is no longer tied to the volume that the file happens to live on (and no, this isn't the same thing as symlinks or shortcuts). This allows the decision as to what the best spot for a file is to be deferred until later- and even spanned across multiple volumes!

      Unfortunately, such a beast is very difficult- if we make a reduction in our requirements- say that performance isn't very important- or perhaps that we can stop using our computer for a few hours each evening, then it's probably possible. What we need is a new kind of file system that supports either atomic moves between disks, or a filesystem that splits the names from the storage.

      A few research projects have been focused on these kinds of changes- but they all tend to break UNIX semantics (Amoeba immediately springs to mind)- and those UNIX semantics are, in-fact, the most widely used and recognized semantics for filesystems anywhere (Even Windows uses them!)-- people who develop a filesystem incapable of supporting them, really need to have a real good reason for breaking everyone's hard work.

      While they often do, it hasn't yet been seen as good enough for general purpose stuff.

    2. Re:I'd love to see it... by drinkypoo · · Score: 1
      Ideally, the operating system chose the best possible spot for that file when it got written. Once that file is deleted, that spot will once again be the fastest best possible spot- for at least something. If the operating system skips that spot for a new file, then this new file isn't going to be accessed quite as quickly.

      Filesystems may be automatically and intelligently defragmented (while live, if the filesystem is decent) when disk I/O is at a minimum.

      Currently, some operating systems (e.g. Windows XP) optimize file location for minimizing boot time. I don't see any reason the same concept couldn't be extended. It might not be a bad idea even in the absence of these other concepts.

      --
      "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
    3. Re:I'd love to see it... by mrsbrisby · · Score: 2, Informative

      Filesystems may be automatically and intelligently defragmented (while live, if the filesystem is decent) when disk I/O is at a minimum.

      But my filesystem is never idle, or even nearly so. Nonetheless, fragmentation isn't exactly a bad thing, and doesn't necessarily have to cause problems (such as lost performance) by itself.

      Worse still: How does the defragmenter know to avoid using this block? Or how does it know that it's a good candidate to be moved to the other end of the disk?

      We could make a record of every block that we access and sort it- as later information for the defragmenter, but would it really help? Wouldn't the cost of updating that information be too high?

      These real questions have been examined and explored, and the short answer of it is simply "we don't know."

      Access patterns are so complicated it's not even funny.

      That's where tiered storage comes in- instead of trying to make the perfect defragmentation utility, we simply "copy" all of our storage to the lesser speed disks and update all the pointers.

      We'd still never know when it was safe to remove the data from the first disk, but at least we'd never have to know- if we needed space, we could simply reclaim any storage from the fast disk that had already been mirrored. We might guess wrong, but on next access, we could reinforce that it guessed wrong by copying it back.

      As a result, we don't have to do extra writes in the middle, and we don't need a fancy defragmenter...

      Currently, some operating systems (e.g. Windows XP) optimize file location for minimizing boot time. I don't see any reason the same concept couldn't be extended. It might not be a bad idea even in the absence of these other concepts.

      NTFS does this by having a (small) table near the beginning of the disk. The ``files'' in it are packed in such a way that the entire table is read into memory- as the boot procedure "knows" that they will all be needed shortly.

      Unfortunately, updating this table is expensive, and it isn't very large, so it cannot keep very many files.

  26. Where did the article say Desktops? by Phishcast · · Score: 1
    I read the article and I don't see anything desktop specific here. It sounds like you have a single storage array on the back end to which your (file/database/whatever) servers are attached. The storage array has both high performance Fibre Channel drives and less expensive drives. It keeps track of which blocks are accessed most frequently and migrates them to the appropriate disk tier.

    Sure, your desktop connects over the network to a SAN attached server in some fashion, but I don't see anywhere in the article that says this product:

    A. runs a desktop agent of some sort that classifies your data based on access patterns
    B. is meant to be directly attached to desktop machines

    Where did desktops come from in the article summary? This isn't for your workstation folks.

  27. Hot File Adaptive Clustering? by Kadin2048 · · Score: 3, Informative
    I have heard mention of this as well, but I'd never seen any details. I tried to dig up some information; here's what I found.

    Apple's "About disk optimization with Mac OS X" (basically telling you that you don't need to defrag), says "Mac OS X 10.2 and later includes delayed allocation for Mac OS X Extended-formatted volumes. This allows a number of small allocations to be combined into a single large allocation in one area of the disk." ... "Mac OS X 10.3 Panther can also automatically defragment such slow-growing files [that data is continually appended to]. This process is sometimes known as "Hot-File-Adaptive-Clustering.""

    There's also a reference to a "hot band," a region of the drive where data is written that's used during startup, in order to increase performance and I assume lessen boot times.

    There's also reference to some automatic defragging in this macosxhints article on HFAC:
    There are 2 separate file optimizations going on here.

    The first is automatic file defragmentation. When a file is opened, if it is highly fragmented (8+ fragments) and under 20MB in size, it is defragmented. This works by just moving the file to a new, arbitrary, location. This only happens on Journaled HFS+ volumes.

    The second is the "Adaptive Hot File Clustering". Over a period of days, the OS keeps track of files that are read frequently - these are files under 10MB, and which are never written to. At the end of each tracking cycle, the "hottest" files (the files that have been read the most times) are moved to a "hotband" on the disk - this is a part of the disk which is particularly fast given the physical disk characteristics (currently sized at 5MB per GB). "Cold" files are evicted to make room. As a side effect of being moved into the hotband, files are defragmented. Currently, AHFC only works on the boot volume, and only for Journaled HFS+ volumes over 10GB.
    So that seems to be the deal; if anyone else has more information, I'd be interested to hear about it.

    There's also a MacSlash article on HFAC and a discussion on Ars that includes a post of the source code.
    --
    "Ladies and gentlemen, my killbot features Lotus Notes and a machine gun. It is the finest available."
  28. This is simple by mkw87 · · Score: 1

    This is so simple, you have your good failsafe raid1 setup with 10,000 rpm hard drives for the IMPORTANT data and the rest of your drives are just 7200 rpm drives to store what is not important. Its really easy to determine which drive the data goes to too:

    if filename=pr0n
        store_on_good_drive
    else
        store_on_slow_drive

    --
    Arguing with an engineer is like wrestling a pig in mud. Soon, you realize the pig is dirty, and he likes it.
  29. its for ricers by thc4k · · Score: 1

    So, if i watch a movie once, its rarely used, put on the slowest disk and stutters when played? ;p Well, anyways, where on a desktop system would you need really expensive high speed data stortage? Normal disks these days are extremely large and fast and they never come under any stress anywhere near a high performance webserver or anything alike. While backups system will get more important while disk space gets cheaper and cheaper, i dont see any need for more performace. People will just put up a raid with some 500GB drives pretty soon ...

  30. A more useful endeavor by insanarchist · · Score: 0

    Wouldn't it make more sense to work on making more intelligent/efficient automatic backups? I mean, there are (generally) two types of computer users: nerds(gamers/IT/programmers/etc.) and non-nerds(grandma/sister/college students/most home-office people). Nerds, for the most part, know exactly how to make their machines run as fast as possible when it comes to what to store where, and that's because they care about performance, be it top speed, reliability, or both. Non-nerds want the damned thing to work, and get really pissed if it doesn't. 99% of said non-nerds don't make any kind of backup whatsoever, so why not take those extra drives you're proposing and stick a copy of their work on there? Automatically? It can't be that hard to automatically copy word documents...

  31. Illegible Blobs by Doc+Ruby · · Score: 1

    I really wish that my local host's storage were used only as scratch space for encrypting all my data for network storage, and a local cache. Why should I lug "my" PC around when there are PCs everywhere? Maybe if my PC were really better than the others, but for most of my data access, any Web terminal will do. Combine that with a biometric/password protected mobile "phone" containing my keyring and bookmarks, and I'm literally "good to go".

    --

    --
    make install -not war

  32. For the desktop or the small business? by Anonymous Coward · · Score: 0
    I've tried to build things like this. You know, with the cost of drives and the comparitive performance of drives vs. DVD and tape and such, I ended up just buying firewire drives and I run rsync once a day and then I have a set of them that I keep in a safe and iterate through.


    Tape is flat out, I could see doing it periodically and putting tapes off site somewhere but it's incredibly slow compared to drives. It's nice in theory and I could see it having a market for archival type stuff, but consumer grade stuff and small business stuff just sucks.


    I can't help but think good solid backup technology will be a killer app on the desktop and small business in the future. The whole backup software biz is kind of built around the idea of tapes and such, if you have any tape rotation policy, pulling incrementals together can be a pain in the ass and most tape systems are so bloody slow that you can't afford to do a full backup very often.


    Even with the set of firewire drive, some machines are network attached and it's slower. having some intelligence that does the mirroring would be sweet, also for stuff that isn't touched very often, compressing it would be sweet. It seems like it's time for backup technology to be retooled and brought up to par with the technology. It seems like there are a lot of things you could do to optimize this kind of setup. Compress data that isn't changed very often, find duplicate data between machines (like the MP3 library is copied on to 4 diffferent machines at my house) stuff like that. I also really like having a non-proprietary format for my data.

  33. I already do this ... by kozubik · · Score: 1

    I already do this at my home.

    Big files that I don't mind losing (ripped dvds and cds) are on a local, cheap raid-5 array.

    Everything else resides on my PC.

    Every night, my PC runs an automatic rsync job that syncs it all up to my rsync.net filesystem.

    I guess, theoretically, I could take it a step further, and add a layer of geographic (and even political) redundancy by making my account sync to California and Colorado, and not just the primary CA site.

    rsync.net just announced sites in Switzerland and India ... hmmm...another tier :)

  34. Veritas Storage Foundation by nanimo · · Score: 1

    Quality of Storage Service (QoSS) has been a feature in Veritas Storage Foundation for two years or so.

  35. Raid 0 array... by Anonymous Coward · · Score: 1, Funny

    My single disk raid 0 setup works great...

  36. this is new? by Anonymous Coward · · Score: 0

    I've been doing this for the past decade, at least. Two drives, First one partitioned to hold OS and system files on one partition, swap file on another partition. Other drive partitioned into frequently used apps/data and [essentially] archival. Initially, it was a by-product of the limit imposed by WinNT, on drive size -- but it's grown into a habitual application of tiered storage to all of my lab machines, and - even - my home boxes. If it wasn't for the !@&^$#*&$ system registry b.s., this would be an even better setup (give me back my .ini files, damn it! - I want portability!).

  37. How do they handle free space? Among other things by sirwired · · Score: 1

    I've read about this company before. However, I'm not sold on it, and at last check (a couple of monhts ago), their website was remarkably bereft of useful technical detail.

    My biggest question is how they handle free space tracking? Unless this box has "hooks" into the filesystem, it is not going to have the faintest clue when data has been deleted.

    Also, can you say "Holy Fragmentation Batman!"? Again, pretty intense "hooks" into the filesystem are going to be required in order to keep files even remotely together. A tape backup of a large file on this mess is going to take all 'freakin week.

    I'll take good-'ol-fashioned file or volume-based HSM, which has been around for a great many years, over this block-based stuff any day of the week. I might change my mind if they published some nice juicy technical papers on how they handled free-space and fragmentation issues.

    SirWired

  38. Not understanding statistics? POOR GUY... by kyc · · Score: 0

    Evaluating sentences one by one and changing their meanings is not fair-play. I see your so-called insightful comment(!)... 1st of all, you should understand this > PRIORITIES. In everything in science, daily life, health, insurance, things to do OR equivalently data storage has priorities. You can look it up if you do not grasp the meaning. ---Absent other data, it only denotes frequency of use, period---- Frequency of use HAS nothing to do with period. Period is inverse of frequency. Frequency stands for the NEED of that information here. This means NECCESSITY. You are right that playboy.com gets a lot of hits. Then IT SHOULD HAVE PRIORITY THAT MUST BE FAST. Even to your use. It is PRIORITY. IT is NEEDED. Can you understand that ? Let me deliver the coup-de-grace : ---There is actually very little correlation between what the average user wants and what s/he needs, as is empirically obvious.--- IF the user does not know what he needs; or he cannot correlate his needs with his PRIORITIES then it is an ineffective user and it REALLY his problem. Oh; then we should accomodate DATA STORAGE SPEED vs. PERFORMANCE according to the grandnannies who can't seperate their needs with their wishes. Then they complain just like the comment posted by the guy who does not , unfortunately, understand statistics. QED.

    --
    There's plenty of room at the bottom! Richard P. Feynmann
  39. 1996, Netware 4.1 & HP hardware by maggard · · Score: 1

    I built something like this 10 years ago. A big corporation's in-house marketing & PR department, lotsa project files full of artwork and such for campaigns, big files used daily for months then ignored for years. It was MacOS 9 & Windows 95 clients, Netware 4.1 on a HP server with RAID 5 and 2 DLTs w/ loaders.

    One DLT was for backups using ARCServe (before they got bought by CA). It was simply a matter of shipping cartridges in and out of the storage vault & off-site as required, replacing individual tapes when they got too old.

    The other DLT was 2nd tier storage. As files aged on the RAID 5 array they'd first be compressed by Netware for space savings, then after they were inert some period of time they'd be migrated to tape. If read they'd be automatically pulled back from tape, decompressed, and returned to active use on the RAID array until they aged back to 2nd tier storage again.

    The whole architecture was nothing more then a few settings under Netware, it was invisible to clients, and seemed to manage itself quite well. I did a lot of tests; pulling tapes, inserting damaged tapes, pulling drives, setting yesterday age-dates and then pulling back files, etc. - it ran flawlessly. I recall being particularly impressed that backups & indexing could be exempted from counting as 'reads' and not pulling every file back to active use.

    Sadly the department was outsourced a few weeks after the new architecture went live, so I never really got to see it all in ongoing use. I took advantage of the change to make my own departure to a saner environment (IT's employee-retention average was literally weeks & after 2 years I'd had enough.)

    However the tiered storage was a thing of beauty, and dead easy to administer after it was set up. I've always suspected that was one of Netware's problems: It didn't need much baby-sitting so it kinda fell of most folk's radar, it just did a few things but it did them really really well, maybe too well.

    --
    I don't read ACs: If a post isn't worth so much as a nom de plume to its author then I wont bother either.
  40. It's not, really by swb · · Score: 1

    But it would be nice to see the technology adapted to consumer price points, but it probably won't be as long as huge ATA disks are $200.

  41. Gee, a New Idea, only 43 years old! by CAOgdin · · Score: 2, Informative

    It was about 1962, when IBM was touting something they called "Percolate & Drip" storage. The idea was that things that were used often "percolated" up to the fastest storage medium, while data that was only infrequently used would "drip" down to the most capacious media. Why do children get to claim everything they imagine is somehow NEW? Mature adults try to stand on the shoulders of giants.

  42. Been there, done that. by Eunuchswear · · Score: 1

    That's how things used to work on ICL George 3/4 circa 1977.

    The joys of waiting for an operator to load a tape so you could edit a file, hoping he wouldn't CANTDO.

    (Little used files got shoved of to mag tape. Still showed up in the filestore. When you accessed them a message was sent to the operator: "PLEASE LOAD VOLUME ASBHJ123 FOR :HUGHES.SOMEFILE(1/FORT)", if the lazy bugger didn't want to load the tape, or if he couldn't find it he'd type "CANTDO LOAD VOLUME" and you'd get a horrid error).

    --
    Watch this Heartland Institute video