Linux Not Quite Ready For New 4K-Sector Drives
Theovon writes "We've seen a few stories recently about the new Western Digital Green drives. According to WD, their new 4096-byte sector drives are problematic for Windows XP users but not Linux or most other OSes. Linux users should not be complacent about this, because not all the Linux tools like fdisk have caught up. The result is a reduction in write throughput by a factor of 3.3 across the board (a 230% overhead) when 4096-byte clusters are misaligned to 4096-byte physical sectors by one or more 512-byte logical sectors. The author does some benchmarks to demonstrate this. Also, from the comments on the article, it appears that even parted is not ready, since by default it aligns to 'cylinder' boundaries, which are not physical cylinder boundaries and are multiples of 63."
I know that Fedora seems to have addressed this with parted 2.1.1 and util-linux-ng 2.1. Both are scheduled for Fedora 13, but can be pulled into Fedora 12 by those getting the hardware early.
Can You Say Linux? I Knew That You Could.
There is an excellent thread talking about how recent (2.6.31+) linux kernels try to report the underlying hard drive architecture (found via the OSNews comments). Alas, it looks like some of these drives are not reporting this data correctly and thus automatic adjustment (at partitioning time) is not taking place. It looks like in the future rather than trying to do detection by reported capability fdisk (and hopefully gparted) will default to sectors of 1MiB if the topology can't be found by default (unless your media is small).
Additionally, I gather that recent Fedoras will try to adjust things like LVM to match larger sectors too. Hopefully whatever is laying out LVM will also be fixed too.
Coincidentally, it looks like Oracle have a very committed dev trying to make this stuff work by default...
About the microcode part. The drive pretends to be a 512byte drive, but internally is using 4k sectors and and claims to 'translate transparently'. I can understand that in a random-access scenario it it has to read-modify-write 2 sectors each time and performance suffers (2 additional reads and one additional write). But in a sequential access scenario, the penalty should be once per sequence/file, not once per sector. Here the microcode fails completely to make the best out of the suboptimal situation.
We're adjusting our disklabel64 utility and kernel support to set the partition base offset such that it is physically aligned instead of slice-aligned, and we are using 32K alignment. That should fix the problem without having to mess around with fdisk.
The DragonFly 64-bit disklabel structure uses 64-bit byte offsets instead of sector addressing to specify everything. It ensures things are at least sector aligned but we wanted to make disk images more portable across devices with potentially different sector sizes. The HAMMER fs uses byte-granular addressing for the same reason, 16K aligned.
-Matt