Sun Adding Flash Storage to Most of Its Servers
BobB-nw writes "Sun will release a 32GB flash storage drive this year and make flash storage an option for nearly every server the vendor produces, Sun officials are announcing Wednesday. Like EMC, Sun is predicting big things for flash. While flash storage is far more expensive than disk on a per-gigabyte basis, Sun argues that flash is cheaper for high-performance applications that rely on fast I/O Operations Per Second speeds."
High frequency, low volume operations - metadata journalling, certain database transactions - will go to flash, and low frequency, high volume operations - file transfers, bulk data moves - will go to regular hard drives. SSDs aren't yet all that much faster for bulk data moving, so it makes the most economic sense to put them where they're most needed: Where the IOPs are.
Back in the day, a single high-performance SCSI drive would sometimes play the same role for a big, cheap, slow array. Then, as now, you'd pay the premium price for the smallest amount of high-IOPs storage that you could get away with.
People (read: vendors) now frequently refer to flash storage as superior when IOPs are the main issue.
From what I've been able to discern this is actually true only in read-mostly applications and applications where writes are already in neat multiples of the flash erase block size.
If you're doing random small writes your performance is likely to be miserable, because you'll need to erase blocks of flash much larger than the data actually being changed, then rewrite the block with the changed data.
Some apps, like databases, might not care about this if you're able to get their page size to match or exceed that of the underlying storage medium. Whether or not this is possible depends on the database.
For some other uses a log-oriented file system might help, but those have their own issues.
In general, though, flash storage currently only seems to be exciting for random read-mostly applications, which get a revolting performance boost so long as the blocks being written are small enough and scattered enough. For larger contiguous reads hard disks still leave flash in the dust because of their vastly superior raw throughput.
Vendors, however, make a much larger margin on flash disk sales.
This article (PDF) may be of interest:
Understanding Flash SSD performance
(google text version).
It sounds like the SSDs are internal drives for the server. A database would never be stored on an internal hard drive. Almost any commercial database is connected to a disk farm through SAN fabric.
SSDs really shine for OLTP databases. Lots of random IO occurs on these databases (as opposed to data warehouses that use lots of sequential IO).
Normal hard drives are horrible for random IO because of mechanical limitations. Think about trying to switch tracks on a record player thousands of times per second; this is whats happening inside a hard drive (under a random IO load). Its amazing mechanical HDDs work as well as they do.
I was thinking about this at Fry's the other day when trying to decide whether I could trust the replacement Seagate laptop drive similar to the one that crashed on me Sunday, and I concluded that the place I most want to see flash deployed is in laptops. Eventually, HDDs should be replaced with SSDs for obvious reliability reasons, particularly in laptops. However, in the short term, even just a few gigs of flash could dramatically improve hard drive reliability and battery life for a fairly negligible increase in the per-unit cost of the machines.
Basically, my idea is a lot like the Robson cache idea, but with a less absurd caching policy. Instead of uselessly making tasks like booting faster (I basically only boot after an OS update, and a stale boot cache won't help that any), the cache policy should be to try to make the hard drive spin less frequently and to provide protection of the most important data from drive failures. This means three things:
That last part is the best part. As data gets written to the hard drive, if the disk is not already spinning, the data would be written to the flash. The drive would spin up and get flushed to disk on shutdown to ensure that if you yank the drive out and put it into another machine, you don't get stale data. It would also be flushed whenever the disk has to spin up for some other activity (e.g. reading a block that isn't in the cache). The cache should also probably be flushed periodically (say once an hour) to minimize data loss in the event of a motherboard failure. If the computer crashes, the data would be flushed on the next boot. (Of course this means that unless the computer had boot-firmware-level support for reading data through such a cache, the OS would presumably need to flush the cache and disable write caching while updating or reinstalling the OS to avoid the risk of an unbootable system and/or data loss.)
As a result of such a design, the hard drive would rarely spin up except for reads, and any data frequently read would presumably come out of the in-kernel disk cache, so basically the hard drive should stay spun down until the user explicitly opened a file or launched a new application. This would eliminate the nearly constant spin-ups of the system drive resulting from relatively unimportant activity like registry/preference file writes, log data writes, etc. By being non-volatile, it would do so in a safe way.
This is similar to what some vendors already do, I know, but integrating it with the OS's buffer cache to make the caching more intelligent and giving the user the ability to request backups of certain data seem like useful enhancements.
Thoughts? Besides wondering what kind of person thinks through this while staring at a wall of hard drives at Fry's? :-)
Check out my sci-fi/humor trilogy at PatriotsBooks.
Re: "Adding a flash storage option" is pretty much an engineering nonevent, and a very minor logistical task.
You have no idea what you are talking about. Sun customers demand that the product Sun sells them have known reliability properties and that Sun guarantees their products properly interact with each other. It takes a significant amount of resources to do this validation. At the same time SSDs and HDDs react very differently to load and can have all sorts of side effects if the OS/application is not prepared to deal with them.