Garbage Collection Algorithms Coming For SSDs
MojoKid writes "A common concern with the current crop of Solid State Drives is the performance penalty associated with block-rewriting. Flash memory is comprised
of cells that usually contain 4KB pages that are arranged in blocks of 512KB. When a cell is unused, data can be written to it relatively quickly. But if a cell already contains some data, even if it fills only a single page in the block, the entire block must be re-written. This means that whatever data is already present in the block must be read, then it must be combined or replaced, and the entire block is then re-written. This process takes much longer than simply writing data straight to an empty block. This isn't a concern on fresh, new SSDs, but over time, as files are written, moved, deleted, or replaced, many blocks are a left holding what is essentially orphaned or garbage data, and their long-term performance degrades because of it. To mitigate this problem, virtually all SSD manufacturers have incorporated, or soon will incorporate, garbage collection schemes into their SSD firmware which actively seek out and remove the garbage data. OCZ, in combination with Indilinx, is poised to release new firmware for their entire line-up of Vertex Series SSDs that performs active garbage collection while the drives are idle, in order to restore performance to like-new condition, even on a severely 'dirtied' drive."
A weakness was found in first generation drives, the second generation drives fixed it.
Film at 11.
No sig today...
"Garbage collection" has already quite different usage in CS. And while what has to be done to those SSDs isn't technically the same as defragmentation on HDDs, it is still "performing drive maintenance to combat performance-degrading results of prolonged usage, deletion of files".
One that hath name thou can not otter
I think it ends up being like NCQ. The drive's processor can be much more specialized and can do the processing much more efficiently. Not to mention, it might require standards to be changed, since some busses (like USB, IIRC) don't provide commands to zero-out a sector on a low level. On an SSD, just writing a sector with zeros doesn't work the same as blanking the memory. It just makes the drive use a still-blank sector for the next write to that sector. The problem only comes when you run out of blank sectors.
So what does this do when forensics are being done on one of these drives? Is the firmware just doing a better job of marking a dirty block available or do the dirty blocks have to be zeroed at some point. Even if the blocks are just marked will they output zeros if 'dd'ed by an OS?
I don't want my porn garbage collected thank you very much. Who died and made you king of deciding what's garbage.
These posts express my own personal views, not those of my employer
Wouldn't the drive benefit from a real understanding of the filesystem for this sort of thing? If it knew a sector was unallocated on a filesystem level, it would know that sectors were empty/unneeded, even if they had been written to nicely. Or should computers now have a way of tagging a sector as "empty" on the drive?
Either way, it looks like an OS interaction would be very helpful here.
Or are modern systems already doing this, and I'm just behind the times?
The Garbage collector restores performance of the drive. Nothing comes free, so a question - at what cost?
In theory there is no difference between theory and practice. In practice there is. - Yogi Berra
why? its low level but it doesn't affect the above filesystem.
on the list of reasons why it SHOULD be done by the OS not the firmware are:
*OS has a better clue about idleness
*OS can create idleness by holding unimportant writes for a while (ext4 style) and using this time to do GC
*OS can decide to save power by not doing this while on batterypower
on the list AGAINST i only have:
*jtownatpunk.net thinks it should be platform independent and thinks this can't be achieved without doing it in firmware
put out the essence of the driver in public-domain and code a version for windows/mac if required, that way all oses will use the same logic even if they have completely different drivers.
IranAir Flight 655 never forget!
How does the firmware know what sectors are empty if it doesn't understand this stuff?
I am curious how it works, if it doesn't need knowledge of the filesystem. FAT, NTFS, UFS, EXT2/3/4, ZFS, etc are all very different.
The filesystem tells the SSD "LBA's x to y are now not in use" using the ATA trim command.
http://www.theregister.co.uk/2009/05/06/win_7_ssd/
Over-provisioned SSDs have ready-deleted blocks, which are used to store bursts of incoming writes and so avoid the need for erase cycles. Another tactic is to wait until files are to be deleted before committing the random writes to the SSD. This can be accomplished with a Trim operation. There is a Trim aspect of the ATA protocol's Data Set Management command, and SSDs can tell Windows 7 that they support this Trim attribute. In that case the NTFS file system will tell the ATA driver to erase pages (blocks) when a file using them is deleted.
The SSD controller can then accumulate blocks of deleted SSD cells ready to be used for writes. Hopefully this erase on file delete will ensure a large enough supply of erase blocks to let random writes take place without a preliminary erase cycle.
Actually I used to work on an embedded system that used M Systems' TrueFFS. There the flash translation layer actually understood FAT enough to work out when a cluster was freed. I.e. it knew where the FAT was and when it was written it would check for clusters being marked free at which point it would mark them as garbage internally.
echo -e 'global _start\n _start:\n mov eax, 2\n int 80h\n jmp _start' > a.asm; nasm a.asm -f elf; ld a.o -o a;
No, what the actual situation is is that a block consists of some number of pages (currently on the flash used in SSDs it tends to be 128). The pages can be written individually, but only sequentially (so, write page 1, then page 2, then page 3), and the pages cannot be erased individually, you need to erase the whole block.
The consequence of this is that when the FS says "Write this data to LBA 1000" the SSD cannot overwrite the existing page it is stored without erasing its block, so instead it find somewhere else to store it, and in its internal tables it marks the old page as invalid. Later when the GC is sweeping blocks for consolidation the number of valid pages is one of the criteria it uses to figure out what to do. If a block has very few valid pages and has been completely filled then those pages will probably be copied to another block that is mostly valid and the block the data was originally in will be erased.
You need to read up much, much more on the state of SSDs before making such sweeping, and incorrect, generalizations.
There are algorithms in existence, such as clever "garbage collection" (which is a bad name for this process when applied to SSDs - it's only a bit like "garbage collection" as it is traditionally known as a memory management technique in languages like Java) combined with wear levelling algorithms, and having extra capacity not reported to the OS to use as a cache of "always ready to write to" blocks, that can keep SSD performance excellent in 90% of use cases, and very good in most of the remaining 10%. Point being that for the majority of use cases, SSD performance is excellent almost all of the time.
Intel seems to have done the best job of implementing these smart algorithms in its drive controller, and their SSD drives perform at or near the top of benchmarks when compared against all other SSDs. They have been shown to retain extremely good performance as the drive is used (although not "fresh from the factory" performance, there is some noticeable slowdown as the drive is used, but it's like going from 100% of incredibly awesome performance to 85% of incredibly awesome performance - it's still awesome, just not quite as awesome as brand new), and except for some initial teething pains caused by flaws in their algorithms that were corrected by a firmware update, everything I have read about them - and I have done *alot* of research on SSDs, indicates that they will always be faster than any hard drive in almost every benchmark, regardless of how much the drive is used. And they have good wear levelling so they should last longer than the typical hard drive as well (not forever, of course - but hard drives don't last forever either).
Indilinx controllers (which are used in newer drives from OCZ, Patriot, etc) seem to be second best, about 75% as good as the Intel controllers.
Samsung controllers are in third place, either ahead, behind, or equal to Indilinx depending on the benchmark and usage pattern, but overall, and especially in the places where it counts the most (random write performance), a bit behind Indilinx.
There are other controllers that aren't benchmarked as often and so it's not clear to me where they sit (Mtron, Silicon Motion, etc) in the standings.
Finally, there's JMicron in a very, very distant last place. JMicron's controllers were so bad that they singlehandedly gave the entire early-generation SSD market a collective black eye. The one piece of advice that can be unequivically stated for SSD drives is, don't buy a drive based on a JMicron controller unless you have specific usage patterns (like, rarely doing writes, or only doing sequential writes) that you can guarantee for the lifetime of the drive.
I've read many, many articles about SSDs in the past few months because I am really interested in them. Early on in the process I bought a Mtron MOBI 32 GB SLC drive (I went with SLC because although it's more than 2x as expensive as MLC, I was concerned about performance and reliability of MLC). In the intervening time, many new controllers, and drives based on them, have come out that have proven that very high performance drives can be made using cheaper MLC flash as long as the algorithms used by the drive controller are sophisticated enough.
Bottom line: I would not hesitate for one second to buy an Intel SSD drive. The performance is phenomenal, and there is nothing to suggest that the estimated drive lifetime that Intel has specified is inaccurate. I would also happily buy Indilinx-based drives (OCZ Vertex or Patriot Torx), although I don't feel quite as confident in those products as I do in the Intel ones; in any case they all meet or exceed my expectations for hard drives. I've already decided that I'm never buying a spinning platter hard drive again. Ever. I have the good fortune of not being a movie/music/software pirate so I rarely use more than a couple dozen gigs on any of my systems anyway, so the smal
I have been working closely with OCZ on this new firmware and wanted to clear things up a bit. This new firmware *does not*, *in any way at all*, remove or eliminate orphaned data, deleted files, or anything of the like. It does not reach into the partition $bitmap and figure out what clusters are unused (like newer Samsung firmwares). It does not even use Windows 7 TRIM to purge unused LBA remap table entries upon file deletions.
What it *does* do is re-arrange in-place data that was previously write-combined (i.e. by earlier small random writes taking place). If data was written to every LBA of the drive, then all files were subsequently deleted, all data would remain associated with those LBAs. This actually puts OCZ above most of the pack, because their algorithm restores performance without needing to reclaim unused flash blocks, and does so completely independent of the data / partition type used. This is particularly useful for those concerned with data recovery of deleted files, since the data is never purged or TRIMmed.
Slashdot-specific Translation: This firmware will enable an OCZ Vertex to maintain full speed (~160 MB/sec) sequential writes and good IOPS performance when used under Mac and Linux.
Hardware-nut Translation: This firmware will enable OCZ Vertex to maintain full performance when used in RAID configurations.
I'll have my full evaluation of this firmware up at PC Perspective later today. Once available, it will appear at this link:
http://www.pcper.com/article.php?aid=760
Regards,
Allyn Malventano
Storage Editor, PC Perspective
this sig was brought to you by the letter