Automated Tiered Storage Coming to Desktops?

← Back to Stories (view on slashdot.org)

Automated Tiered Storage Coming to Desktops?

Posted by ryuzaki0 on Monday June 26, 2006 @05:54AM from the not-the-droid-jpgs-you're-looking-for dept.

roj3 writes "Tiered storage has been the scourge of administrators because the vendors tell us to hold meetings with all departments and then classify data to storage tier based on its type or relative importance. eWeek has a story about a new approach to tiered storage — sorting it all by usage patterns. Regularly used data goes on high-performance storage, idle data goes on slower/cheaper storage. Volumes and files even span several types of drives or RAID levels. Is automated tiered storage headed to desktops?"

15 of 110 comments (clear)

Min score:

Reason:

Sort:

Re:Oh....good.. by Red+Flayer · 2006-06-26 06:15 · Score: 3, Informative

That's what metatagging is for. Tag files that are not to be moved to slow storage no matter how infrequently they are accessed. RTFA.

--
"Trolls they were, but filled with the evil will of their master: a fell race..." -- J.R.R. Tolkien on Olog-hai
Yes, Kinda... by ThinkFr33ly · 2006-06-26 06:16 · Score: 4, Informative

Automatic tiered storage is definitely coming, but probably not in the form of multiple disks that run at different speeds or RAID levels.

Microsoft announced a while back that Windows Vista would support three technologies designed to improve disk speed called SuperFetch, ReadyBoost, and ReadyDrive. SuperFetch is simply a way of preloading applications and data when the OS anticipates that you'll be loading those soon.

ReadyBoost and ReadyDrive both utilize persistent memory caches to speed up access to the disk.

ReadyBoost treats normal USB keys and flash disks like temporary caching locations for data from the disk.

ReadyDrive is essentially the term Microsoft uses to described their support for hybrid hard drives, which are disks that have a built in flash memory module that's used as a persistent cache.

Not only do hybrid disks dramatically increase performance, but they also result in huge power savings for mobile devices like laptops and media players.
Re:Not so new... by truthsearch · 2006-06-26 06:17 · Score: 2, Informative

Ten years ago you had something automated that determined where the files should go and moved them appropriately? It analyzed usage patterns? I'd really like to know what older systems had such features as I've never seen them.

--
Developers: We can use your help.
Liars, damn liars, and statistics by Medievalist · 2006-06-26 06:31 · Score: 4, Informative

Decades ago, we used to laugh at the mainframers and their automated hierarchical storage systems because they'd make exactly these kinds of statements.

Frequency of use DOES denotes importance, at the very least STATISTICALLY.
No. Absent other data, it only denotes frequency of use, period. Playboy.com gets more hits than the general ledger webapp if you unblock your company firewall, but the general ledger is more important to the company.

Just because you want "that special little something" once a year; does not mean you can degrade the speed of information which is instantly needed.
There is actually very little correlation between what the average user wants and what s/he needs, as is empirically obvious. If the image from the "fly-fishing.com" website that they've set to come up as their background image every morning fails to load, they can still work, but if the once-a-year corporate audit checklist gets put on slow, old storage and then gets lost in a hardware failure, the company stock price may flutter and certainly heads will roll in the corporate IS department.

This is an obvious fact
I don't think that word means what you think it means.
Hot File Adaptive Clustering? by Kadin2048 · 2006-06-26 06:47 · Score: 3, Informative

I have heard mention of this as well, but I'd never seen any details. I tried to dig up some information; here's what I found.

Apple's "About disk optimization with Mac OS X" (basically telling you that you don't need to defrag), says "Mac OS X 10.2 and later includes delayed allocation for Mac OS X Extended-formatted volumes. This allows a number of small allocations to be combined into a single large allocation in one area of the disk." ... "Mac OS X 10.3 Panther can also automatically defragment such slow-growing files [that data is continually appended to]. This process is sometimes known as "Hot-File-Adaptive-Clustering.""

There's also a reference to a "hot band," a region of the drive where data is written that's used during startup, in order to increase performance and I assume lessen boot times.

There's also reference to some automatic defragging in this macosxhints article on HFAC:
There are 2 separate file optimizations going on here.

The first is automatic file defragmentation. When a file is opened, if it is highly fragmented (8+ fragments) and under 20MB in size, it is defragmented. This works by just moving the file to a new, arbitrary, location. This only happens on Journaled HFS+ volumes.

The second is the "Adaptive Hot File Clustering". Over a period of days, the OS keeps track of files that are read frequently - these are files under 10MB, and which are never written to. At the end of each tracking cycle, the "hottest" files (the files that have been read the most times) are moved to a "hotband" on the disk - this is a part of the disk which is particularly fast given the physical disk characteristics (currently sized at 5MB per GB). "Cold" files are evicted to make room. As a side effect of being moved into the hotband, files are defragmented. Currently, AHFC only works on the boot volume, and only for Journaled HFS+ volumes over 10GB.
So that seems to be the deal; if anyone else has more information, I'd be interested to hear about it.

There's also a MacSlash article on HFAC and a discussion on Ars that includes a post of the source code.

--
"Ladies and gentlemen, my killbot features Lotus Notes and a machine gun. It is the finest available."
Re:Not so new... by dpilot · 2006-06-26 06:53 · Score: 4, Informative

It was called HSM, (Hierarchical Storage Management) it ran on IBM's MVS on mainframes, and it moved your less-used data to cheaper storage, in several stages. IIRC, the first stage was just compression on a different disk, the second stage was a tapes in a jukebox-type thing, and the third stage was tapes that an operator fetched and loaded. Somewhere way back there, data never used for 5 years fell off the end of the belt, but you got warned, first.

The day after vacation, when you kept getting the message, "DFHSM is recalling dataset xyz for user jkl" as it pulled all of your storage back online was a pain, and we all thought it would be neat to get rid of, as we migrated to workstations. But in retrospect, HSM was great, never having to worry about your data quantity. That's compared with having to root through $HOME every few months to take care of quota problems.

--
The living have better things to do than to continue hating the dead.
Re:Coming full circle -- good idea! by ratboy666 · 2006-06-26 06:58 · Score: 3, Informative

And my favorite commands on the ol' HP-2000 mini:

SANCTIFY and DESECRATE

"Sanctify file" moved the file to drum (basically, one-drive RAID 0 for all you young-uns). Desecrate moved it to the regular hard disk.

YMMV

Ratboy

--
Just another "Cubible(sic) Joe" 2 17 3061
Re:Certainly could be done in a desktop by COMON$ · 2006-06-26 06:59 · Score: 3, Informative

Because 2 10K Raptors in Raid 0 isnt worth the speed increase. Last time I checked you may get a 20% increase, and reduced data integrity. I did some research into this a while ago, check out this article, very informative
http://www.anandtech.com/printarticle.aspx?i=2101

--
CS: It is all sink or swim...oh and did I mention there are sharks in that water?
Re:Not so new... by Doctor+Memory · 2006-06-26 07:05 · Score: 3, Informative

you had something automated that determined where the files should go and moved them appropriately? It analyzed usage patterns?

Oh yeah. BITD, there was the archiver, a job that ran every night and moved files that hadn't been accessed in the last N time periods to tape. It left the VTOC entry (kind of like an inode), just marked it "archived" and the label of the tape. Then, the next time that file was accessed, a hook in the open() call would send a message to the console operator telling them to mount tape such-and-such. When the tape was mounted, the archiver would automatically copy the file back into place, the open() call would complete normally, and life was good. Basically transparent to the user (they'd look at their directory and all their files would be there), except for the fact that the file open would take two-three minutes. Then again, since they were paying for disk storage by the block-day, they were generally pretty happy to only pay for a fifty-cent tape mount every quarter instead of keeping that 1200-block file on-line for three months when they weren't using it.

--
Just junk food for thought...
Re:Great Idea by mollog · 2006-06-26 07:09 · Score: 4, Informative

Hewlett-Packard Company developed a product that did this automagically. It was an external RAID system that connected via one or two SCSI busses to a host. All incoming data was stored in RAID 0/1; striped and mirrored. (aka RAID 6 and RAID 10). As the storage filled up, unused data was automagically migrated to more space-efficient RAID 5. Data that had been accessed recently remained in RAID 0/1. You could add disk drives and it would automagically include the drives (but you would have to use LVM or other utilities in the OS to increase its file system.) You could mix two drive sizes, say, 18GB and 36GB, without trouble. If a drive failed, the array would rebuild reduncancy. If another drive failed, ditto. It was fast, it was fully redundant.

But it was a lot smarter than the admins who had to use it so it wasn't very popular.

--
Best regards.
Re:Not so new... by Anonymous Coward · 2006-06-26 07:19 · Score: 1, Informative

I'd really like to know what older systems had such features as I've never seen them.

Novell.
Re:Not so new... by drinkypoo · 2006-06-26 07:44 · Score: 2, Informative

That's not the same thing. At all. Read the article again.

You know, I did read it, and what they're talking about is that data that is less used/less critical gets moved to slower/less reliable storage automatically.

And when you have only two kinds of storage, a DASD bank and mag tape, and your system automatically writes least used data to tape and tells you to file it, and asks you for tapes when it needs them - well, I'd say the two are highly analogous. The fact that the slower storage is offline is nothing more than a detail. It's still storage, it's still available - it just has a wicked long seek time.

I realize that offline storage is different - but only in some details. The majority is the same concept.

So it's not the same thing, but the "at all" is bullshit. They're quite similar.

--
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
Re:I'd love to see it... by mrsbrisby · 2006-06-26 09:41 · Score: 2, Informative

Filesystems may be automatically and intelligently defragmented (while live, if the filesystem is decent) when disk I/O is at a minimum.

But my filesystem is never idle, or even nearly so. Nonetheless, fragmentation isn't exactly a bad thing, and doesn't necessarily have to cause problems (such as lost performance) by itself.

Worse still: How does the defragmenter know to avoid using this block? Or how does it know that it's a good candidate to be moved to the other end of the disk?

We could make a record of every block that we access and sort it- as later information for the defragmenter, but would it really help? Wouldn't the cost of updating that information be too high?

These real questions have been examined and explored, and the short answer of it is simply "we don't know."

Access patterns are so complicated it's not even funny.

That's where tiered storage comes in- instead of trying to make the perfect defragmentation utility, we simply "copy" all of our storage to the lesser speed disks and update all the pointers.

We'd still never know when it was safe to remove the data from the first disk, but at least we'd never have to know- if we needed space, we could simply reclaim any storage from the fast disk that had already been mirrored. We might guess wrong, but on next access, we could reinforce that it guessed wrong by copying it back.

As a result, we don't have to do extra writes in the middle, and we don't need a fancy defragmenter...

Currently, some operating systems (e.g. Windows XP) optimize file location for minimizing boot time. I don't see any reason the same concept couldn't be extended. It might not be a bad idea even in the absence of these other concepts.

NTFS does this by having a (small) table near the beginning of the disk. The ``files'' in it are packed in such a way that the entire table is read into memory- as the boot procedure "knows" that they will all be needed shortly.

Unfortunately, updating this table is expensive, and it isn't very large, so it cannot keep very many files.
Gee, a New Idea, only 43 years old! by CAOgdin · 2006-06-26 13:47 · Score: 2, Informative

It was about 1962, when IBM was touting something they called "Percolate & Drip" storage. The idea was that things that were used often "percolated" up to the fastest storage medium, while data that was only infrequently used would "drip" down to the most capacious media. Why do children get to claim everything they imagine is somehow NEW? Mature adults try to stand on the shoulders of giants.
Re:Not so new... by Anonymous Coward · 2006-06-26 18:24 · Score: 2, Informative

Guys, guys, guys, you talk about HSM like it is old and gone. On the contrary, it is the future! We use it everyday where I work, using a program called SAMFS. We have a tape robot and a large disk cache. Data that is used often stays on the cache. Less used data goes to tapes. SAMFS sorts out when/how to do that. The system is great not only because the software figures all of this out for you, but also because it works as your backup software. We switched to this system about 4 years ago when we realized that doing a nightly backup of our TB's of storage was taking hours and was no longer a nightly backup but a morning backup as well. Plus, adding additional disk drives for storage was going to be a very costly thing.

While it is true that getting less used data OFF of the tape takes some time, it is not that bad if it is data you don't use very often. Depending on the file, it seems to take about 8 minutes or so for us. I think that's because the tape has to find the file, but once it has, it can just copy it off.

I highly recommend SAMFS. Unfortunately, it only runs on Solaris. :( We examined several Linux solutions at the time, but they were too unstable. Perhaps now there are better ones out there?

Check it out. Even if you don't use it, it's cool stuff.

John