Linux Not Quite Ready For New 4K-Sector Drives
Theovon writes "We've seen a few stories recently about the new Western Digital Green drives. According to WD, their new 4096-byte sector drives are problematic for Windows XP users but not Linux or most other OSes. Linux users should not be complacent about this, because not all the Linux tools like fdisk have caught up. The result is a reduction in write throughput by a factor of 3.3 across the board (a 230% overhead) when 4096-byte clusters are misaligned to 4096-byte physical sectors by one or more 512-byte logical sectors. The author does some benchmarks to demonstrate this. Also, from the comments on the article, it appears that even parted is not ready, since by default it aligns to 'cylinder' boundaries, which are not physical cylinder boundaries and are multiples of 63."
damnit, obviously since this is not technically the 'first post', my web browser must be misaligned by a post
the first time i have ever actually gotten 'first post'... it is when i try to make a joke about not having gotten first post. ya see my first post was supposed to come up like second or third.. it would have been HILARIOUS . .. but oh no
in soviet russia, the fates mock you!!!!
The simple solution is to set you Sectors per Track to 32. This would make sure that everything is properly aligned (except the first partition, usually /boot, which is mis-aligned by one cylinder).
http://www.osnews.com/thread?409281
I am no kernel hacker but I can almost guarantee that some kernel hacker will provide a solution to this "short coming" fairly soon.
That's the beauty of Open Source.
I am aware though that "fairly soon" means many things to many people; which means that there could be a substantial delay before we get a working solution to this issue.
I am optimistic nevertheless.
Request to Western Digital: Provide all the information needed to develop a solution.
$ time cp winxp.img /mnt/sdc # ALIGNED
/mnt/sdd # UNALIGNED
/mnt/sdc # ALIGNED
/mnt/sdd # UNALIGNED
real 5m9.360s
user 0m0.090s
sys 0m20.420s
$ time cp winxp.img
real 13m26.943s
user 0m0.110s
sys 0m19.350s
$ time cp -r Computer Architecture/
real 42m9.602s
user 0m0.680s
sys 1m59.070s
$ time cp -r Computer Architecture/
real 138m54.610s
user 0m0.660s
sys 2m15.630s
The first two being a single file, the latter two being multiple files in a larger directory structure.
I would heartily disagree with you on the matter.
I know that Fedora seems to have addressed this with parted 2.1.1 and util-linux-ng 2.1. Both are scheduled for Fedora 13, but can be pulled into Fedora 12 by those getting the hardware early.
Can You Say Linux? I Knew That You Could.
Dear Slashdot,
I've been around for a while. Enough to understand, nay, love the fact that you are linux supporters and all that. But I remain an ardent supporter of truth and speaking in ways which are concise and leads the reader in the direction of truth. Nothing in this news story is inaccurate, but to make it a point to say that Windows XP is incompatible with no mention of Vista and 7 being perfectly compatible should be an embarrassment of journalistic integrity.
Windows XP may not work with the new WD Green drives, but Vista and on have been perfectly comfortable with 4096 byte sectors. A lay reader may read this story and not "Read between the lines" as I have learned to do here. Their take away may be that Microsoft operating systems are broken in some way (which they are in a lot of ways), but not this one!
fdisk doesn't need to be fixed, it needs to be deprecated. DOS partition tables are a ridiculously bad artifact of the past. We won't be using them for much longer anyway; they're limited to 2TB for 512-byte-sector drives (or 4K drives with 512-byte emulation).
I disagree with the "but nothing that it unacceptable or even a need for concern" part. If copying the same image to the same disk where the only difference is where the partition begins -- by one sector difference -- will amount to a 2.6x decrease in speed, and where copying a large directory with many subdirectories amounts to a 3.2x decrease in speed... that should qualify as an unacceptable decrease in speed that warrants concern.
While a kernel tweak may help alleviate the issue, it is primarily an issue with our current (userspace) disk partitioning and formatting utilities. I'd also disagree with you on the point where the problem is the drive microcode; drives should do what they are told, and not guess on behalf of the instructions they are given what to do. Admittedly, the microcode tweak would be minor and largely trivial, but I'd rather not fix (primarily) userspace software problems in the kernel, nor the device firmware.
should be an embarrassment of journalistic integrity.
Slashvertisements, basic English grammar and spelling problems, completely wrong summaries and titles...
...and you a)think that Slashdot is "journalism" and b)it's had integrity to lose in the first place?
I like Slashdot, but gimme a break...it's a user-driven blog which directs readers to existing stories (now often lagging behind the major news wires) with good categorization and semi-sophisticated commenting system, utilized by a larger commenter population. Not much more, and definitely not journalism.
Please help metamoderate.
dev/sdd:
Model=WDC WD15EARS-00Z5B1, FwRev=80.00A80, SerialNo=
Config={ HardSect NotMFM HdSw>15uSec SpinMotCtl Fixed DTR>5Mbs FmtGapReq }
RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=50
BuffType=unknown, BuffSize=unknown, MaxMultSect=16, MultSect=16
It looks to me that this should *really* be fixed by WD with a firmware update
.
Solution: Instead of fdisk, call it as fdisk -H 224 -S 56 as per Theodore Tso's blog.
The real problem is that it is lying about it's sector size, it's reporting 512 bytes when it's using 4k, if it told linux it was using 4k everything would be fine and dandy.
Why does it lie about it's sector size when it doesn't need to? because if it didn't the drives would not work on windows XP at all. Which would not bode well for sales.
Once drives with 4k sectors arrive its up the individual maintainers of each affected tool (fdisk, et. al.) to update their code.
Kernel handles sector sizes, and could handle 4k sectors ages ago, but when the hardware reports something it tends to trust it, which is now apparent it shouldn't. (512 byte sectors being implemented as an emulation layer of sorts on these drives.. and enabled by default)
There is an excellent thread talking about how recent (2.6.31+) linux kernels try to report the underlying hard drive architecture (found via the OSNews comments). Alas, it looks like some of these drives are not reporting this data correctly and thus automatic adjustment (at partitioning time) is not taking place. It looks like in the future rather than trying to do detection by reported capability fdisk (and hopefully gparted) will default to sectors of 1MiB if the topology can't be found by default (unless your media is small).
Additionally, I gather that recent Fedoras will try to adjust things like LVM to match larger sectors too. Hopefully whatever is laying out LVM will also be fixed too.
Coincidentally, it looks like Oracle have a very committed dev trying to make this stuff work by default...
About the microcode part. The drive pretends to be a 512byte drive, but internally is using 4k sectors and and claims to 'translate transparently'. I can understand that in a random-access scenario it it has to read-modify-write 2 sectors each time and performance suffers (2 additional reads and one additional write). But in a sequential access scenario, the penalty should be once per sequence/file, not once per sector. Here the microcode fails completely to make the best out of the suboptimal situation.
That's true, but it's also true that having hardware lie to the OS isn't a great situation to be in. At the very least there should be some way of forcing it to be honest for the benefit of OSes that can handle the reality. A lot of the gunk and instability in computing comes from hardware that does things that are more appropriately done by software and vice versa.
Forcing users to optimize isn't inherently wrong, it's just that they shouldn't need to do it for things which are somewhat standard as a work around for weird hardware designs. And yes, I realize that the 4096byte sectors aren't being implemented arbitrarily.
Which is nice if you're wanting to ensure that you've got the lowest possible reliability and safety for your data. While you're at it, make sure you're using a striped non-redundant array of disks as well, best use at least 4 in the array, otherwise you might get some of your data back.
You've got it exactly backwards, people shouldn't be partitioning disks into one huge partition. They should be able to split things up a bit to keep rapidly changing directories from mostly static ones and to manage the risk of filesystem corruption destroying important files.
I see it rather as an indictment against closed-source OSes, if XP turns out to be incompatible with these new drives and MS never releases a patch to add support. People will need to upgrade for no good reason to one of MS's new operating systems. People should not have to deal with a complete upheaval of their tested and true systems due to a small hardware change such as this.
I can imagine MS is quietly chuckling with glee to itself, if this issue becomes a deal-breaker for machines still running XP.
ERROR 144 - REBOOT ?
I just got one of the 1TB 64mb WD drives that is known to be 4kb sector based.
Here is how it shows up in dmesg:
[ 3.420488] sd 1:0:0:0: [sdb] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB)
and here's what hdparm -I says:
ATA device, with non-removable media
Model Number: WDC WD10EARS-00Y5B1
Serial Number: WD-WCAV55227529
Firmware Revision: 80.00A80
Transport: Serial, SATA 1.0a, SATA II Extensions, SATA Rev 2.5, SATA Rev 2.6
Standards:
Supported: 8 7 6 5
Likely used: 8
Configuration:
Logical max current
cylinders 16383 16383
heads 16 16
sectors/track 63 63
--
CHS current addressable sectors: 16514064
LBA user addressable sectors: 268435455
LBA48 user addressable sectors: 1953525168
Logical/Physical Sector size: 512 bytes
device size with M = 1024*1024: 953869 MBytes
device size with M = 1000*1000: 1000204 MBytes (1000 GB)
cache/buffer size = unknown
Capabilities:
LBA, IORDY(can be disabled)
Queue depth: 32
Standby timer values: spec'd by Standard, with device specific minimum
R/W multiple sector transfer: Max = 16 Current = 1
Recommended acoustic management value: 128, current value: 254
DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6
Cycle time: min=120ns recommended=120ns
PIO: pio0 pio1 pio2 pio3 pio4
Cycle time: no flow control=120ns IORDY flow control=120ns
Commands/features:
Enabled Supported:
* SMART feature set
Security Mode feature set
* Power Management feature set
* Write cache
* Look-ahead
* Host Protected Area feature set
* WRITE_BUFFER command
* READ_B
My opinions are my own, and do not necessarily represent those of my employer.
We're adjusting our disklabel64 utility and kernel support to set the partition base offset such that it is physically aligned instead of slice-aligned, and we are using 32K alignment. That should fix the problem without having to mess around with fdisk.
The DragonFly 64-bit disklabel structure uses 64-bit byte offsets instead of sector addressing to specify everything. It ensures things are at least sector aligned but we wanted to make disk images more portable across devices with potentially different sector sizes. The HAMMER fs uses byte-granular addressing for the same reason, 16K aligned.
-Matt
There won't be a partition table with his suggestion. The boot sector set aside by the filesystem will be the very first sector of the disk.
I won't join Slashcott. OTOH, If Beta goes live, I just won't be back until it's fixed. Sorry Dice.
The article represents one data point, for one particular way to install a drive, on one (un-named) version of Gentoo, on one particular model of a WD drive that had a bugzilla entry entered by the author all of 2 days ago. So this is supposed to be an indictment of all of Linux?
The author even mentions that Ubuntu has an option on parted that accomplishes the task properly. I'd be much more interested in an article that talks about how the default installer handles this task rather than concentrating on one particular expert tool that does so. It's still good to know that fdisk on his un-named Gentoo distribution does the wrong thing.. but this hardly means we should fire up the klaxon and declare "Linux not fully prepared for 4096 sector hard drives!". It's certainly interesting, but I'll withhold judgment until we actually know more about the implications of this across the entire spectrum of Linux distributions and the various 4096 sector HDs.
AccountKiller
It seems these drives need a new "don't lie to me, I can handle it" command, so OSes that don't have a problem with 4k size sectors can get the real info.
And then when you install a different distribution, you blow away your home directory. Sorry, bad idea. /home should be in a separate partition from the rest of the stuff..
Also, since I usually have several distributions installed at the same time, I have several partitions...but that's a less common problem.
A better solution would be to have a boot partition snuggled up against the MBR that automatically adapts so that the boot + MBR is an appropriate size, say 32 MB. (My current boot directory is 14MB, so that shouldn't be a problem. These aren't, after all, small drives, so it doesn't hurt to allocate a bit of extra space. Maybe even make that 64MB.)
Perhaps one could rearrange the system tables a bit so that the MBR was counted as a part of the /boot partition, and so was the partition table. They'd need to be an a position guaranteed by the OS, but that's not a real problem.
Note that what I'm proposing is a major redesign, so there's about zero chance of it being adopted. But it's a better choice than scrapping partitions, and probably has a better chance of being adopted.
I think we've pushed this "anyone can grow up to be president" thing too far.
Splitting a disk into multiple pseudo-disks makes sense in many situations, but the clunky legacy partition tables are only good for inter-OS compatibility. Otherwise LVM beats partitions in every respect. Now if only we could get a LVM solution that works in multiple operating systems...
I'll get to why in a second, but first:
RawCHS hasn't meant anything in a decade. The largest drive you can describe with CHS is 8GB.
Track size hasn't meant anything in even longer than that. When drives went to zone bit recording (ZBR), the number of sectors per track became variable. This happened in about 1989.
The sector size does mean something, but it is the actual sector size, not the sector "grouping" size. If the drive reported a sector size of 4K, then it would expect that the host understand that sectors are actually 4K in size, not 512B in size. But really no major OS supports this, they all expect 512B sectors. That's why these drives internally use one sector size and show another size to the host. And there is no way in the ATA specification for devices to indicate their internal sector size when they are presenting a different external sector size.
So this won't be fixed with a firmware update, unless Vista, 7 and every other major OS is fixed to actually support large sectors presented to the host. Then the drive could be firmware updated to report the large sector size to the host. And the drive would then be completely unusable under any earlier OS or with any USB or Fireware adapter.
http://lkml.org/lkml/2005/8/20/95
I forgot, there is one thing RawCHS nowadays. That is that there is no proper spec for how to know if a partition in an MBR (fdisk) partition table is a valid partition. So there are heuristics that are applied to the entries to guess if they are real or to be ignored as empty. One of the heuristics that some software uses is to ignore all partition entries that don't begin on a cylinder boundary. To be on a cylinder boundary, the partition has to start on a sector number that is a multiple of the number of sectors (S in CHS) in order to be valid. And since all drives 8GB or greater present an S of 63, that is why the first partition on an MBR disk has always started at sector 63, which makes it unaligned when the internal sector size is 4K (8 internal sectors).
Windows before 2000 checks the CHS alignment of MBR entries and ignores any partition entries that don't start on a multiple of S. So all disks out there are misaligned. With Windows 2000 or later, you can start the partition on any boundary you want.
Western Digital has a jumper you can put on the drive that adds 1 to all access requests, making all those misaligned first partitions aligned. But it'll also make any aligned partitions misaligned. So the real answer is just to layout your disk different. I would recommend using GUID disk partitioning instead of MBR anyway, because MBR doesn't work for >2TB drives. And GUID doesn't have any weird alignment requirements (and doesn't have any knowledge of CHS).
http://lkml.org/lkml/2005/8/20/95