The Curious Case of SSD Performance In OS X
mr_sifter writes "As we've seen from previous coverage, TRIM support is vital to help SSDs maintain performance over extended periods of time — while Microsoft and the SSD manufacturers have publicized its inclusion in Windows 7, Apple has been silent on whether OS X will support it. bit-tech decided to see how SSD performance in OS X is affected by extended use — and the results, at least with the Macbook Air, are startling. The drive doesn't seem to suffer very much at all, even after huge amounts of data have been written to it."
You can't really draw any conclusions from these results. In one particular Mac, Apple ships a particular Samsung SSD that doesn't degrade, probably because its "clean" performance is already terrible (you might think of it as "pre-degraded"). In other Macs Apple ships Toshiba SSDs that may have completely different behavior. If you put a good SSD (e.g. Intel) in your Mac the behavior will be completely different.
Maybe they should take the drive over to a Windows XP and Windows 7 box and see if it's the drive hardware being resilient or the OS. The G1 intel drives don't drop a ton after use, and they don't support TRIM. It looks like the Intel G1 flash is artificially capped by the controller. It could be similar here.
Apple's description of the zeroing format method we used fits the description of what we wanted in terms of resetting the SSD to a clean state
Zeroing is not the same operation as TRIM. TRIM marks a block as unused, and if you read it you'll either get random data, or zeros (probably the later). Zeroing marks it as in-use, and if you read it you'll get zeros. The SSD's wear management algorithm will move the latter around as though it were real data, whereas it knows the former is "empty" so it won't bother (so the SSD will be faster). In other words, they don't seem to be using a "clean state" at all, which would explain why there's no difference.
Secondly, the SSD in the Macbook Air really isn't very fast at all
Which strengthens the hypothesis that they were comparing one "full" state with another. Pop out the drive, TRIM the whole disk in another OS, and run the benchmark again. It'll probably be a lot faster. It wouldn't surprise me if installing the Mac OS at the factory caused every block on the SSD to be used at least once (e.g. a whole disk image was written), which would mean you'd already be at the worse possible performance degradation.
FTA: "We simulated this by copying 100GB of mixed files (OS files, multiple game installs, MP3s and larger video files ) to the SSD, deleting them, and then repeating the process ten times,"
Surely you should be deleting half the files - every other file - then rewriting them. If you copy a bunch of files then delete them all you're leaving the drive in pretty much the same state as it was at the start, the only difference between passes wil be due to the wear levelling algorithms inside the drive. Overall performance at the end will mostly be a result of the initial condition of the drive, not what happened during the test.
No sig today...
I have to wonder why they didn't boot into Windows on the same PC and repeat the tests. That would have identified if it was a hardware issue, or a software (Filesystem) issue that caused the irregularity.
The problem is that a Flash based SSD needs to have a pool of unused blocks to work around the block-erase stupidity. However, trim only "solves" the problem when there is a good deal of free space on the drive anyway; when the drive nears full, it is useless. At the current pricing, people don't buy SSDs to keep them empty, and one would not expect an SSD to perform badly when full, as with rotating rust.
The solution is to provision enough extra blocks on the drive beyond the advertised capacity. While the vendors refuse to do this, you may do so yourself. Simply create an empty partition the drive--just make sure it isn't ever written or zeroed.
Anyway, the vendors' motivation to cut corners here should be perfectly clear.
I have the feeling that if you match the block size of files written to the block size of the SSD (optionally padding with zeros to fill it up) you could get quite a performance boost - and only when the drive starts to fill up start using the unused, zeroed out fragments from the blocks.
With SSDs it's erase block size that matters most. When a page of memory on a SSD device is being overwritten, the page has to be erased first, and then its contents rewritten.
Of course if your OS implements SCSI PUNCH or ATA TRIM, immediately when a block is freed, your SSD drive can have the memory erased ahead of time, and ready to use with minimal write latency.
Most SSDs use an erase block size of 128K. So you would want your entire filesystem's allocation blocks 128K aligned.. in NTFS world, this would mean you need to align the disk partition, at least prior to Vista/Windows 2008.
I believe MacOS automatically aligns partitions. This issue alone could explain why things might get so bad with Windows, let alone the ridiculously small 4K block size, or fragmentation issues that plague NTFS filesystems.
And the ideal allocation cluster size is 128K (by default NTFS cluster size is 4K, which is not suitable. for best performance change in such a scenario it to 64K)
You can think of the performance issue that can occur as something like RAID stripe crossing.
If you have two 4K blocks side by side, and you need to overwrite just those two blocks.. if they are on different 128K erase blocks, your SSD will be having to write 256K of memory to change 8K of data.
If those 2 NTFS blocks are on the same SSD block, your ssd flushes and writes 128K of data, maybe.
This is before we start thinking about wear-levelling algorithms,
or the fact an efficient SSD will likely keep a pool of non-erased blocks, to satisfy write requests against.
So as long as there is memory left that was scheduled for erasure and flushed due to TRIM or flushed due to being overwritten, there would be no reason to require an extra in-line ERASE (increasing write latency).
So the pool of non-erased pages are important, and the critical elements effecting write performance are (1) time to update header blocks to re-map disk blocks to unused flash pages, and (2) time to copy the erase block with changes to the newly mapped location.
All SSDs have a bit more storage than their rating. Partitioning a little less space on a vendor-fresh drive can double or triple the extra storage available to the SSD's internal wear leveling algorithms. For all intents and purposes this gives you the equivalent of TRIM without having to rely on the OS and filesystem supporting it. In fact, it could conceivably give you better performance than TRIM because you don't really know how efficient the TRIM implementation is in either the OS or the SSD. And because TRIM is a serialized command and cannot be run concurrently with read or write IOs. There are a lot of moving parts when it comes to using TRIM properly. Systems are probably better off not using TRIM at all, frankly.
In case people haven't figured it out, this is one reason why Intel chose multiples of 40G for their low-end SSDs. Their 40G SSD competes against 32G SSDs from other vendors. Their 80G SSD competes against 64G SSDs from other vendors. You can choose nominal performance by utilizing 100% of the advertised space or you can choose to improve upon the already excellent Intel wear leveling algorithms simply by partitioning it for (e.g.) 32G instead of 40G.
We're already seeing well over 200TB in endurance from Intel's 40G drives partitioned for 32G. Intel lists the endurance for their 40G drives at 35TB. I'm afraid I don't have comparitive numbers for when all 40G is used but I am already very impressed when 32G is partitioned for use out of the 40G available.
Unfortunately it is nearly impossible to stress test a SSD and get results that are even remotely related to the real world, since saturated write bandwidth eventually causes erase stalls when the firmware can no longer catch up. In real-world operation write bandwidth is not pegged 100% of the time and the drive can pre-erase space. Testing this stuff takes months and months.
Also, please nobody try to compare USB sticks against real (SATA) SSDs. SSDs have real wear leveling algorithms and enough ram cache to do fairly efficient write combining. USB sticks have minimal wear leveling and basically no ram cache to deal with write combining.
-Matt
I think that the biggest mistake in the test was that they wrote zeros to the drive first, which means that the blocks got allocated (dirty) and had to be read/rewritten with new data when the next phase of the test started. So, basically both tests are the same and it's no wonder they got about the same test results.