Linux Not Quite Ready For New 4K-Sector Drives
Theovon writes "We've seen a few stories recently about the new Western Digital Green drives. According to WD, their new 4096-byte sector drives are problematic for Windows XP users but not Linux or most other OSes. Linux users should not be complacent about this, because not all the Linux tools like fdisk have caught up. The result is a reduction in write throughput by a factor of 3.3 across the board (a 230% overhead) when 4096-byte clusters are misaligned to 4096-byte physical sectors by one or more 512-byte logical sectors. The author does some benchmarks to demonstrate this. Also, from the comments on the article, it appears that even parted is not ready, since by default it aligns to 'cylinder' boundaries, which are not physical cylinder boundaries and are multiples of 63."
Dear Slashdot,
I've been around for a while. Enough to understand, nay, love the fact that you are linux supporters and all that. But I remain an ardent supporter of truth and speaking in ways which are concise and leads the reader in the direction of truth. Nothing in this news story is inaccurate, but to make it a point to say that Windows XP is incompatible with no mention of Vista and 7 being perfectly compatible should be an embarrassment of journalistic integrity.
Windows XP may not work with the new WD Green drives, but Vista and on have been perfectly comfortable with 4096 byte sectors. A lay reader may read this story and not "Read between the lines" as I have learned to do here. Their take away may be that Microsoft operating systems are broken in some way (which they are in a lot of ways), but not this one!
dev/sdd:
Model=WDC WD15EARS-00Z5B1, FwRev=80.00A80, SerialNo=
Config={ HardSect NotMFM HdSw>15uSec SpinMotCtl Fixed DTR>5Mbs FmtGapReq }
RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=50
BuffType=unknown, BuffSize=unknown, MaxMultSect=16, MultSect=16
It looks to me that this should *really* be fixed by WD with a firmware update
.
Solution: Instead of fdisk, call it as fdisk -H 224 -S 56 as per Theodore Tso's blog.
About the microcode part. The drive pretends to be a 512byte drive, but internally is using 4k sectors and and claims to 'translate transparently'. I can understand that in a random-access scenario it it has to read-modify-write 2 sectors each time and performance suffers (2 additional reads and one additional write). But in a sequential access scenario, the penalty should be once per sequence/file, not once per sector. Here the microcode fails completely to make the best out of the suboptimal situation.
terminals have nothing to do with the command line!
i think the op is complaining about the fact that things like
baud, stopbits and whatnot are deeply embedded in the
linux kernel. these concepts are not necessary to
have a command line. c.f. plan 9.
That's true, but it's also true that having hardware lie to the OS isn't a great situation to be in. At the very least there should be some way of forcing it to be honest for the benefit of OSes that can handle the reality. A lot of the gunk and instability in computing comes from hardware that does things that are more appropriately done by software and vice versa.
Forcing users to optimize isn't inherently wrong, it's just that they shouldn't need to do it for things which are somewhat standard as a work around for weird hardware designs. And yes, I realize that the 4096byte sectors aren't being implemented arbitrarily.
On the contrary, this has (almost) nothing to do with Windows - it has everything to do with old OSes. The IDEMA didn't approve the 4K sector standard until 2006; it was only in the late 90's that the first meaningful research was begun by IBM on whether 512B sectors would be an issue.
As it turns out, yes, 512B sectors would be an issue, and drive manufacturers would be best served by moving to larger sectors (with some arguing over whether to go to 1K or 4K). So the IDEMA hashed this out over the first half of the decade, and finally in 2006 approved the 4K specification.
The point of all of this is that software written at the turn of the century was all done well before changing drive sector sizes was a serious discussion. WinXP was released in 2001, Mac OS X 10.0 was in 2001, and of course Linux 2.4 was also in 2001. None of those OSes know what to do with anything other than a 512B sector - the only reason Windows factors in to this equation is that WinXP just happens to be with us (no doubt trying to eat our brains) while the other two are dead. Anything circa 2005 or later such as WinVista, Linux 2.6, and Mac OS X 10.5 know full well what to do with a 4K drive.
But even that is beside the point. You don't just make major jumps like this, you have to do it in a transition so that you don't break old hardware and old software alike. Even if XP/Lin2.4/MacOSX knew what to do with 4K sectors, at some point you'd run in to hardware, 3rd party devices, etc that would not. A transition is necessary to let old hardware and software get flushed out of the ecosystem, and as such we're still years out from consumer drives offering native 4K access.
In short: drives are pretending to have 512-byte sectors because there's a lot of old stuff, including Windows XP that can't deal with 4K sectors.
Actually this problem is potentially much worse on SSD's. Erase blocks are huge, and read-modify-write really sucks on flash.
Couldn't this be addressed (at least in part) by a battery-backed write cache like better RAID controllers use? Set it up like SAN snapshots (so it just stores the diff between what's in the actual flash storage and what's been changed so far), and then write the changed blocks when it's most advantageous (e.g. when there's an entire block's worth of data, so it would all have to be erased by the flash storage anyway).
Maybe combine that with something like a disk defrag, except instead of storing frequently-sequentially-read data in physical sequence, store frequently-written data (regardless of if it's sequentially-read or not) in physical sequence.
That's exactly what most SSD controllers do!
Some now come with 32 to 64MB of cache, and some of the new Sandforce controller based SSDs also come with a little ultracapacitor that acts like a mini UPS. The cache is used as scratch space for reordering writes and defragging blocks.
There was a firmware patch recently for the OCZ Vertex series of SSDs that enabled background defrag. If you let the drive site there for a few minutes, it would start getting faster until it returned to 'as new' speeds