Is the Time Finally Right For Hybrid Hard Drives?
a_hanso writes "Hard drives that combine a traditional spinning platter for mass storage and solid state flash memory for frequently accessed data have always been an interesting concept. They may be slower than SSDs, but not by much, and they are a lot cheaper gigabyte-for-gigabyte. CNET's Harry McCracken speculates on how soon such drives may become mainstream: 'So why would the new Momentus be more of a mainstream hit than its predecessor? Seagate says that it's 70 percent faster than its earlier hybrid drive and three times quicker than a garden-variety, non-hybrid disk. Its benchmarks for cold boots and application launches show the new drive to be just a few seconds slower than a SSD. Or, in some cases, a few seconds faster. In the end, hybrid drives are compromises, neither as cheap as ordinary drives — you can get a conventional 750GB Momentus for about $150 — nor as fast and energy-efficient as SSDs.'"
If there is to be a time for hybrid drives, the window on it is fast closing. As SSDs get cheaper and cheaper more and more people will opt to just go that route. Most people don't really need massive HDDs and so if smaller SSDs get cheap enough that'll be the way they'll go. They don't have to be as cheap as HDDs, just cheap enough that for the size people need (probably 200-300GB for more people) they are affordable enough.
For me personally, the time already came and went. I was very enthusiastic about the concept of hybrid drives, particularly since I have vast storage needs (I do audio production). However no hybrid drive for desktops was forthcoming. Then there was a sale on SSDs, 256GB drives for $200. I picked up two of them. $1/GB was my magic price when I'd be willing to get them. Now I have 512GB of SSD storage for OS, apps, and primary data. That is then backed by 3TB of HDD storage for media, samples, and so on.
A hybrid drive has no place. I'd certainly not replace my SSDs, they are far faster than any hybrid drive (even being fairly slow on the SSD scale). Likewise I have no real reason to upgrade my HDDs, they serve the non-speed intensive stuff.
While I'm willing to spend more than most, it is still a sign of things to come. As those prices drop more and more people will say "screw it" and go all SSD.
There are only two things drive cache can help with significantly. When rebooting, where memory is empty, you can get memory primed with the most common parts of the OS faster if most of that data can be read from the SSD. Optimizers that reorder the boot files will get you much of the same benefit if they can be used.
Disk cache used for writes is extremely helpful, because it allows write combining and elevator sorting to improve random write workloads, making them closer to sequential. However, you have to be careful, because things sitting in those caches can be lost if the power fails. That can be a corruption issue on things that expect writes to really be on disk, such as databases. Putting some flash to cache those writes, with a supercapacitor to ensure all pending writes complete on shutdown, is a reasonable replacement for the classic approach: using a larger battery-backed power source to retain the cache across power loss or similar temporary failures. The risk with the old way is that the server will be off-line long enough for the battery to discharge. Hybrid drives should be able to flush to SSD just with their capacitor buffer, so you're consistent with the filesystem state, only a moment after the server powers down.
As for why read caching doesn't normally help, the operating system filesystem cache is giant compared to any size it might be. When OS memory is gigabytes and drive ones megabytes, you'll almost always be in a double-buffer situation: whatever is in the drive's cache will also still be in the OS's cache, and therefore never be requested. The only way you're likely to get any real benefit from the drive cache is if the drive does read-ahead. Then it might only return the blocks requested to the OS, while caching ones it happened to pass over anyway. If you then ask for those next, you get them at cache speeds. On Linux at least, this is also a futile effort; the OS read-ahead is also smarter than any of the drive logic, and it may very well ask for things in that order in the first place.
One relevant number for improving read speeds is command queue depth. You can get better throughput by ordering reads better, so they seek around the mechanical drive less. There's a latency issue here though--requests at the opposite edge can starve if the queue gets too big--so excessive tuning in that direction isn't useful either.