Linux Not Quite Ready For New 4K-Sector Drives
Theovon writes "We've seen a few stories recently about the new Western Digital Green drives. According to WD, their new 4096-byte sector drives are problematic for Windows XP users but not Linux or most other OSes. Linux users should not be complacent about this, because not all the Linux tools like fdisk have caught up. The result is a reduction in write throughput by a factor of 3.3 across the board (a 230% overhead) when 4096-byte clusters are misaligned to 4096-byte physical sectors by one or more 512-byte logical sectors. The author does some benchmarks to demonstrate this. Also, from the comments on the article, it appears that even parted is not ready, since by default it aligns to 'cylinder' boundaries, which are not physical cylinder boundaries and are multiples of 63."
damnit, obviously since this is not technically the 'first post', my web browser must be misaligned by a post
I heard using parted and GPT labels instead of MSDOS will optimize it on 4096 byte sectors automatically. Any truth to it?
the first time i have ever actually gotten 'first post'... it is when i try to make a joke about not having gotten first post. ya see my first post was supposed to come up like second or third.. it would have been HILARIOUS . .. but oh no
in soviet russia, the fates mock you!!!!
The simple solution is to set you Sectors per Track to 32. This would make sure that everything is properly aligned (except the first partition, usually /boot, which is mis-aligned by one cylinder).
http://www.osnews.com/thread?409281
I actually have 2 of the these drives in my desktop right now. There is a slight decrease in performance compared to Windows 7 but nothing that it unacceptable or even a need for concern. If you need to worry about the performance lost with the 4k sectors then just go solid state.
I am no kernel hacker but I can almost guarantee that some kernel hacker will provide a solution to this "short coming" fairly soon.
That's the beauty of Open Source.
I am aware though that "fairly soon" means many things to many people; which means that there could be a substantial delay before we get a working solution to this issue.
I am optimistic nevertheless.
Request to Western Digital: Provide all the information needed to develop a solution.
Author claims a massive performance drop if things aren't aligned right. Ubuntu already does it with parted and fdisk can do it manually. So, no big problem; fdisk ought to be fixed to have sane defaults with a 4096 byte block size, sure. That can't be all that difficult.
The author also seems to think that only a 30% increase in times for misaligned writes should be expected. I'm not sure why. In a naive implementation I'd expect a 100% increase in time (each block now needs to be written twice). Linux, obviously, doesn't use a naive implementation. It's expected that if the hardware violates the assumptions behind the techniques Linux uses to achieve high performance, that those techniques end up making things very slow instead.
I know that Fedora seems to have addressed this with parted 2.1.1 and util-linux-ng 2.1. Both are scheduled for Fedora 13, but can be pulled into Fedora 12 by those getting the hardware early.
Can You Say Linux? I Knew That You Could.
Easiest fix: stop dividing your disks into partitions.
Dear Slashdot,
I've been around for a while. Enough to understand, nay, love the fact that you are linux supporters and all that. But I remain an ardent supporter of truth and speaking in ways which are concise and leads the reader in the direction of truth. Nothing in this news story is inaccurate, but to make it a point to say that Windows XP is incompatible with no mention of Vista and 7 being perfectly compatible should be an embarrassment of journalistic integrity.
Windows XP may not work with the new WD Green drives, but Vista and on have been perfectly comfortable with 4096 byte sectors. A lay reader may read this story and not "Read between the lines" as I have learned to do here. Their take away may be that Microsoft operating systems are broken in some way (which they are in a lot of ways), but not this one!
Can some kind soul tell me specfically what version of what utility I need to use for me to be OK? Or what settings?
My head hurts from trying to understand cylinders and sectors and drive geometry...
thanks!
should be an embarrassment of journalistic integrity.
Slashvertisements, basic English grammar and spelling problems, completely wrong summaries and titles...
...and you a)think that Slashdot is "journalism" and b)it's had integrity to lose in the first place?
I like Slashdot, but gimme a break...it's a user-driven blog which directs readers to existing stories (now often lagging behind the major news wires) with good categorization and semi-sophisticated commenting system, utilized by a larger commenter population. Not much more, and definitely not journalism.
Please help metamoderate.
The real problem is that it is lying about it's sector size, it's reporting 512 bytes when it's using 4k, if it told linux it was using 4k everything would be fine and dandy.
Why does it lie about it's sector size when it doesn't need to? because if it didn't the drives would not work on windows XP at all. Which would not bode well for sales.
Once drives with 4k sectors arrive its up the individual maintainers of each affected tool (fdisk, et. al.) to update their code.
Kernel handles sector sizes, and could handle 4k sectors ages ago, but when the hardware reports something it tends to trust it, which is now apparent it shouldn't. (512 byte sectors being implemented as an emulation layer of sorts on these drives.. and enabled by default)
There is an excellent thread talking about how recent (2.6.31+) linux kernels try to report the underlying hard drive architecture (found via the OSNews comments). Alas, it looks like some of these drives are not reporting this data correctly and thus automatic adjustment (at partitioning time) is not taking place. It looks like in the future rather than trying to do detection by reported capability fdisk (and hopefully gparted) will default to sectors of 1MiB if the topology can't be found by default (unless your media is small).
Additionally, I gather that recent Fedoras will try to adjust things like LVM to match larger sectors too. Hopefully whatever is laying out LVM will also be fixed too.
Coincidentally, it looks like Oracle have a very committed dev trying to make this stuff work by default...
It appears that this does not effect the older 1TB+ Western Digital Green drives such as the WDC10EADS. Those use 333GB platters and are native 512-byte sectors. The newer (newest) Western Green drives, like the WDC10EARS, use 500MB platters and have 4K sectors. One way to tell the drives apart with a quick glance is the old Green drives had 32MB of cache and the new ones have 64MB of cache.
I see it rather as an indictment against closed-source OSes, if XP turns out to be incompatible with these new drives and MS never releases a patch to add support. People will need to upgrade for no good reason to one of MS's new operating systems. People should not have to deal with a complete upheaval of their tested and true systems due to a small hardware change such as this.
I can imagine MS is quietly chuckling with glee to itself, if this issue becomes a deal-breaker for machines still running XP.
ERROR 144 - REBOOT ?
I posted that the newer WD Green drives use 500MB platters and I meant 500GB. 500MB platters would make for a very physically-large 1TB+ hard drive!
I just got one of the 1TB 64mb WD drives that is known to be 4kb sector based.
Here is how it shows up in dmesg:
[ 3.420488] sd 1:0:0:0: [sdb] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB)
and here's what hdparm -I says:
ATA device, with non-removable media
Model Number: WDC WD10EARS-00Y5B1
Serial Number: WD-WCAV55227529
Firmware Revision: 80.00A80
Transport: Serial, SATA 1.0a, SATA II Extensions, SATA Rev 2.5, SATA Rev 2.6
Standards:
Supported: 8 7 6 5
Likely used: 8
Configuration:
Logical max current
cylinders 16383 16383
heads 16 16
sectors/track 63 63
--
CHS current addressable sectors: 16514064
LBA user addressable sectors: 268435455
LBA48 user addressable sectors: 1953525168
Logical/Physical Sector size: 512 bytes
device size with M = 1024*1024: 953869 MBytes
device size with M = 1000*1000: 1000204 MBytes (1000 GB)
cache/buffer size = unknown
Capabilities:
LBA, IORDY(can be disabled)
Queue depth: 32
Standby timer values: spec'd by Standard, with device specific minimum
R/W multiple sector transfer: Max = 16 Current = 1
Recommended acoustic management value: 128, current value: 254
DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6
Cycle time: min=120ns recommended=120ns
PIO: pio0 pio1 pio2 pio3 pio4
Cycle time: no flow control=120ns IORDY flow control=120ns
Commands/features:
Enabled Supported:
* SMART feature set
Security Mode feature set
* Power Management feature set
* Write cache
* Look-ahead
* Host Protected Area feature set
* WRITE_BUFFER command
* READ_B
My opinions are my own, and do not necessarily represent those of my employer.
I have a tiny 1.8" usb harddisk with 4096-byte sectors, and the Ubuntu installer crashes when it tries to read the partitioning information. Very annoying.
I suffer from attention surplus disorder.
We're adjusting our disklabel64 utility and kernel support to set the partition base offset such that it is physically aligned instead of slice-aligned, and we are using 32K alignment. That should fix the problem without having to mess around with fdisk.
The DragonFly 64-bit disklabel structure uses 64-bit byte offsets instead of sector addressing to specify everything. It ensures things are at least sector aligned but we wanted to make disk images more portable across devices with potentially different sector sizes. The HAMMER fs uses byte-granular addressing for the same reason, 16K aligned.
-Matt
There won't be a partition table with his suggestion. The boot sector set aside by the filesystem will be the very first sector of the disk.
I won't join Slashcott. OTOH, If Beta goes live, I just won't be back until it's fixed. Sorry Dice.
The article represents one data point, for one particular way to install a drive, on one (un-named) version of Gentoo, on one particular model of a WD drive that had a bugzilla entry entered by the author all of 2 days ago. So this is supposed to be an indictment of all of Linux?
The author even mentions that Ubuntu has an option on parted that accomplishes the task properly. I'd be much more interested in an article that talks about how the default installer handles this task rather than concentrating on one particular expert tool that does so. It's still good to know that fdisk on his un-named Gentoo distribution does the wrong thing.. but this hardly means we should fire up the klaxon and declare "Linux not fully prepared for 4096 sector hard drives!". It's certainly interesting, but I'll withhold judgment until we actually know more about the implications of this across the entire spectrum of Linux distributions and the various 4096 sector HDs.
AccountKiller
Don't partition the drive in XP - format the entire thing and don't split it apart. Get a secondary physical drive.
Still waiting on Serviscope_minor to wake up to fucking reality and realize that Jessica Price isn't going to fuck him.
I wouldn't be too fond of the MS development model from what I hear from those who were on the inside:
http://www.nytimes.com/2010/02/04/opinion/04brass.html?pagewanted=all
Inside Microsoft, political infighting trumps common sense. If you really want to hold up a closed source development model as an example of "what works" take a look at Apple. They crank out far better products with a fraction of the resources.
My rights don't need management.
It seems these drives need a new "don't lie to me, I can handle it" command, so OSes that don't have a problem with 4k size sectors can get the real info.
I noticed both a performance hit AND stalling for a minute at a time when there was a lot of HDD activity, so I can confirm part of your experience. After going with 56 sectors per track as well, the freezing seems to be a thing of the past. The speed is definitely greatly improved.
The affected drives are listed on Western Digital's site.
I'm happy to see that you saw the stalling disappear as well, it gives me confidence about the cause.
Time will tell if the the lock-ups were caused by the 512b partitioning (I wasn't able to repartition my drives until a few days ago), but it's a bit reassuring to know that I wasn't the only one experiencing these annoying as fsck freezes.
The affected drives on my end are WDC WD10EACS-00ZJB0 and WDC WD10EADS-00L5B
.
(I think that) It's probably misaligned. LVM uses a 192k sector size for it's metadata. See Theodore Ts'o's post for more information.
fdisk an elegant tool for a more civilized age.. no wait. fdisk is antiquated and we only use it, because we are afraid to leave the msdos partition table behind out of the irrational fear some other software would stop working.
There are four flavors of 4096 byte-sectored drives:
4096 physical/logical - the bookkeeping parts of the file system cause read/modify write cycles because they are nearly always less than 4096 bytes, but the performance hit is relatively small; parted is badly broken. If they're less than 2TiB, then you can use an MBR, otherwise the kernel is broken for partition sizes.
4096 physical/512 logical; LBA 0 aligned "off by one" with physical block 0 - created to deal with stupid BIOS (and Win XP, where some drivers rely on it), mostly work fine with the default tools, but still have the bookkeeping issues. Because Win Vista/7 and OS X use GPT AND don't worry about "track" boundaries, they work better than Linux.
4096 physical/512 logical; LBA 0 aligned with physical block 0 - works great with Win Vista/7 and OS X, but the Linux installers are still aligning on the bogus track boundary, and not asking the physical/logical alignment. Performance, without some very smart tweaking by the person doing the formatting REALLY stinks.
4096 physical/512 logical, but are reporting 512 physical (usually aligned 0 for 0) - again to deal with BIOS/Win XP. Basically, treat ALL drives produced starting with 2010 as having 4K sectors, aligned 0 for 0, unless they explicitly report otherwise, and use the same human-intervention-required layout as above.
Currently, the tools are the most pressing issue, since they are really broken in this respect, but there are kernel issues, as well, with drives larger that 2TiB and 4096-byte sectors.
I had never ever heard of drive alignment until I bought an SSD.
Not to be unhelpfully pessimistic, but... didn't it even occur to people that drive alignment might be important? Is 4K drive alignment just Y2K for hard drives? Why did people only start thinking about this now?
I'll get to why in a second, but first:
RawCHS hasn't meant anything in a decade. The largest drive you can describe with CHS is 8GB.
Track size hasn't meant anything in even longer than that. When drives went to zone bit recording (ZBR), the number of sectors per track became variable. This happened in about 1989.
The sector size does mean something, but it is the actual sector size, not the sector "grouping" size. If the drive reported a sector size of 4K, then it would expect that the host understand that sectors are actually 4K in size, not 512B in size. But really no major OS supports this, they all expect 512B sectors. That's why these drives internally use one sector size and show another size to the host. And there is no way in the ATA specification for devices to indicate their internal sector size when they are presenting a different external sector size.
So this won't be fixed with a firmware update, unless Vista, 7 and every other major OS is fixed to actually support large sectors presented to the host. Then the drive could be firmware updated to report the large sector size to the host. And the drive would then be completely unusable under any earlier OS or with any USB or Fireware adapter.
http://lkml.org/lkml/2005/8/20/95
I forgot, there is one thing RawCHS nowadays. That is that there is no proper spec for how to know if a partition in an MBR (fdisk) partition table is a valid partition. So there are heuristics that are applied to the entries to guess if they are real or to be ignored as empty. One of the heuristics that some software uses is to ignore all partition entries that don't begin on a cylinder boundary. To be on a cylinder boundary, the partition has to start on a sector number that is a multiple of the number of sectors (S in CHS) in order to be valid. And since all drives 8GB or greater present an S of 63, that is why the first partition on an MBR disk has always started at sector 63, which makes it unaligned when the internal sector size is 4K (8 internal sectors).
Windows before 2000 checks the CHS alignment of MBR entries and ignores any partition entries that don't start on a multiple of S. So all disks out there are misaligned. With Windows 2000 or later, you can start the partition on any boundary you want.
Western Digital has a jumper you can put on the drive that adds 1 to all access requests, making all those misaligned first partitions aligned. But it'll also make any aligned partitions misaligned. So the real answer is just to layout your disk different. I would recommend using GUID disk partitioning instead of MBR anyway, because MBR doesn't work for >2TB drives. And GUID doesn't have any weird alignment requirements (and doesn't have any knowledge of CHS).
http://lkml.org/lkml/2005/8/20/95
Cylinders, head, sector addressing really needs to go anyway. MBR has been hacked so many times over the years to support this method of formatting but in reality it's a total waste. Old operating systems that were forced to work with this method of addressing had to translate to and from the CHS format since sector allocation has ALWAYS been linear. Even FAT-12 used linear addressing methods.
CHS is also misleading and CHS optimizations are wasteful. Since all modern drives (starting with the first Connor IDE 20 meg drive in the early days) supported some form of intelligent sector remapping that would keep spare sectors available for relocating data after the magnetic medium of a heavily used sector elsewhere began to fail.
Sector remapping makes it so that CHS optimizations are entirely irrelevant since even brand new drives, straight off the production line ship with bad sectors that have been remapped elsewhere. For better drive performance, algorithms space out the spare sectors across the drive so that when accessing a spare sector, the head doesn't have to slam to the inner or outer rings of the disc. But still, CHS doesn't apply to absolute positions anymore.
This is 2010 now, I wrote file systems which functioned on 4096 byte sectors on ESDI drives back in the 80's. Made my drives much bigger doing it too. It's time that we move to larger sector sizes again. Modern ECC isn't that much better than 80's grade, however the processing power available to us is so much more that performing ECC on larger blocks of data is achievable. Also, using RAID-5, 5EE or RAID6 makes it so that we can depend less on single drive redundancy. SCSI and IDE should be extended so that controller can inform the drive of bad sectors it finds when performing RAID XORing.
Epitaph: At last! Root access!
Yep, right until the NEXT feature comes out that breaks the OS that says it can handle the truth. Now you need TWO "don't lie to me" commands. Repeat ad infinitum.
All that needs to happen is a very minor update to the 'fdisk' utility to make it start partitions at 1MB. Microsoft saw this coming and changed the partitioner in Vista (and newer) to align on 1MB boundaries.
There don't have to be kernel tweaks. I've been doing this myself for years when I install Linux or Windows. Windows XP's 'diskpart' aligns by 32K, whereas the built-in format in setup from the CD aligns on 31.5K, so just format the drive from BartPE's diskpart utility first. In Linux you just drop into 'expert' mode in fdisk, hit 'x' and tell your partition to start at 2048 sectors (1MB).
Why 1MB instead of 32K or 64K? Because some RAID arrays have stripe sizes up to 1MB, and it just makes sense to 'waste' one meg in order to have alignment work on bounds of 4K, 16K, 32K, 64K, 128K, 256K, and 1MB (all common page or RAID stripe sizes).
"Sometimes, I think Trent just needs a cup of hot chocolate and a blankie." -Tori Amos on Nine Inch Nails
I just bought two of the WD Green 500GB drives to be used in a hardware RAID (Adaptec 2610SA, aka Dell CERC SATA1.5/6ch) on my Ubuntu-based server. I was going to format it in ext3. Will this problem affect me?