Linux Not Quite Ready For New 4K-Sector Drives
Theovon writes "We've seen a few stories recently about the new Western Digital Green drives. According to WD, their new 4096-byte sector drives are problematic for Windows XP users but not Linux or most other OSes. Linux users should not be complacent about this, because not all the Linux tools like fdisk have caught up. The result is a reduction in write throughput by a factor of 3.3 across the board (a 230% overhead) when 4096-byte clusters are misaligned to 4096-byte physical sectors by one or more 512-byte logical sectors. The author does some benchmarks to demonstrate this. Also, from the comments on the article, it appears that even parted is not ready, since by default it aligns to 'cylinder' boundaries, which are not physical cylinder boundaries and are multiples of 63."
dev/sdd:
Model=WDC WD15EARS-00Z5B1, FwRev=80.00A80, SerialNo=
Config={ HardSect NotMFM HdSw>15uSec SpinMotCtl Fixed DTR>5Mbs FmtGapReq }
RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=50
BuffType=unknown, BuffSize=unknown, MaxMultSect=16, MultSect=16
It looks to me that this should *really* be fixed by WD with a firmware update
.
Solution: Instead of fdisk, call it as fdisk -H 224 -S 56 as per Theodore Tso's blog.
Actually this problem is potentially much worse on SSD's. Erase blocks are huge, and read-modify-write really sucks on flash.
Couldn't this be addressed (at least in part) by a battery-backed write cache like better RAID controllers use? Set it up like SAN snapshots (so it just stores the diff between what's in the actual flash storage and what's been changed so far), and then write the changed blocks when it's most advantageous (e.g. when there's an entire block's worth of data, so it would all have to be erased by the flash storage anyway).
Maybe combine that with something like a disk defrag, except instead of storing frequently-sequentially-read data in physical sequence, store frequently-written data (regardless of if it's sequentially-read or not) in physical sequence.
That's exactly what most SSD controllers do!
Some now come with 32 to 64MB of cache, and some of the new Sandforce controller based SSDs also come with a little ultracapacitor that acts like a mini UPS. The cache is used as scratch space for reordering writes and defragging blocks.
There was a firmware patch recently for the OCZ Vertex series of SSDs that enabled background defrag. If you let the drive site there for a few minutes, it would start getting faster until it returned to 'as new' speeds