Linux Not Quite Ready For New 4K-Sector Drives
Theovon writes "We've seen a few stories recently about the new Western Digital Green drives. According to WD, their new 4096-byte sector drives are problematic for Windows XP users but not Linux or most other OSes. Linux users should not be complacent about this, because not all the Linux tools like fdisk have caught up. The result is a reduction in write throughput by a factor of 3.3 across the board (a 230% overhead) when 4096-byte clusters are misaligned to 4096-byte physical sectors by one or more 512-byte logical sectors. The author does some benchmarks to demonstrate this. Also, from the comments on the article, it appears that even parted is not ready, since by default it aligns to 'cylinder' boundaries, which are not physical cylinder boundaries and are multiples of 63."
damnit, obviously since this is not technically the 'first post', my web browser must be misaligned by a post
I heard using parted and GPT labels instead of MSDOS will optimize it on 4096 byte sectors automatically. Any truth to it?
the first time i have ever actually gotten 'first post'... it is when i try to make a joke about not having gotten first post. ya see my first post was supposed to come up like second or third.. it would have been HILARIOUS . .. but oh no
in soviet russia, the fates mock you!!!!
The simple solution is to set you Sectors per Track to 32. This would make sure that everything is properly aligned (except the first partition, usually /boot, which is mis-aligned by one cylinder).
http://www.osnews.com/thread?409281
I actually have 2 of the these drives in my desktop right now. There is a slight decrease in performance compared to Windows 7 but nothing that it unacceptable or even a need for concern. If you need to worry about the performance lost with the 4k sectors then just go solid state.
I am no kernel hacker but I can almost guarantee that some kernel hacker will provide a solution to this "short coming" fairly soon.
That's the beauty of Open Source.
I am aware though that "fairly soon" means many things to many people; which means that there could be a substantial delay before we get a working solution to this issue.
I am optimistic nevertheless.
Request to Western Digital: Provide all the information needed to develop a solution.
Author claims a massive performance drop if things aren't aligned right. Ubuntu already does it with parted and fdisk can do it manually. So, no big problem; fdisk ought to be fixed to have sane defaults with a 4096 byte block size, sure. That can't be all that difficult.
The author also seems to think that only a 30% increase in times for misaligned writes should be expected. I'm not sure why. In a naive implementation I'd expect a 100% increase in time (each block now needs to be written twice). Linux, obviously, doesn't use a naive implementation. It's expected that if the hardware violates the assumptions behind the techniques Linux uses to achieve high performance, that those techniques end up making things very slow instead.
I know that Fedora seems to have addressed this with parted 2.1.1 and util-linux-ng 2.1. Both are scheduled for Fedora 13, but can be pulled into Fedora 12 by those getting the hardware early.
Can You Say Linux? I Knew That You Could.
Easiest fix: stop dividing your disks into partitions.
Dear Slashdot,
I've been around for a while. Enough to understand, nay, love the fact that you are linux supporters and all that. But I remain an ardent supporter of truth and speaking in ways which are concise and leads the reader in the direction of truth. Nothing in this news story is inaccurate, but to make it a point to say that Windows XP is incompatible with no mention of Vista and 7 being perfectly compatible should be an embarrassment of journalistic integrity.
Windows XP may not work with the new WD Green drives, but Vista and on have been perfectly comfortable with 4096 byte sectors. A lay reader may read this story and not "Read between the lines" as I have learned to do here. Their take away may be that Microsoft operating systems are broken in some way (which they are in a lot of ways), but not this one!
Can some kind soul tell me specfically what version of what utility I need to use for me to be OK? Or what settings?
My head hurts from trying to understand cylinders and sectors and drive geometry...
thanks!
How can one quickly check for alignment issues?
should be an embarrassment of journalistic integrity.
Slashvertisements, basic English grammar and spelling problems, completely wrong summaries and titles...
...and you a)think that Slashdot is "journalism" and b)it's had integrity to lose in the first place?
I like Slashdot, but gimme a break...it's a user-driven blog which directs readers to existing stories (now often lagging behind the major news wires) with good categorization and semi-sophisticated commenting system, utilized by a larger commenter population. Not much more, and definitely not journalism.
Please help metamoderate.
The real problem is that it is lying about it's sector size, it's reporting 512 bytes when it's using 4k, if it told linux it was using 4k everything would be fine and dandy.
Why does it lie about it's sector size when it doesn't need to? because if it didn't the drives would not work on windows XP at all. Which would not bode well for sales.
Once drives with 4k sectors arrive its up the individual maintainers of each affected tool (fdisk, et. al.) to update their code.
Kernel handles sector sizes, and could handle 4k sectors ages ago, but when the hardware reports something it tends to trust it, which is now apparent it shouldn't. (512 byte sectors being implemented as an emulation layer of sorts on these drives.. and enabled by default)
There is an excellent thread talking about how recent (2.6.31+) linux kernels try to report the underlying hard drive architecture (found via the OSNews comments). Alas, it looks like some of these drives are not reporting this data correctly and thus automatic adjustment (at partitioning time) is not taking place. It looks like in the future rather than trying to do detection by reported capability fdisk (and hopefully gparted) will default to sectors of 1MiB if the topology can't be found by default (unless your media is small).
Additionally, I gather that recent Fedoras will try to adjust things like LVM to match larger sectors too. Hopefully whatever is laying out LVM will also be fixed too.
Coincidentally, it looks like Oracle have a very committed dev trying to make this stuff work by default...
It appears that this does not effect the older 1TB+ Western Digital Green drives such as the WDC10EADS. Those use 333GB platters and are native 512-byte sectors. The newer (newest) Western Green drives, like the WDC10EARS, use 500MB platters and have 4K sectors. One way to tell the drives apart with a quick glance is the old Green drives had 32MB of cache and the new ones have 64MB of cache.
I see it rather as an indictment against closed-source OSes, if XP turns out to be incompatible with these new drives and MS never releases a patch to add support. People will need to upgrade for no good reason to one of MS's new operating systems. People should not have to deal with a complete upheaval of their tested and true systems due to a small hardware change such as this.
I can imagine MS is quietly chuckling with glee to itself, if this issue becomes a deal-breaker for machines still running XP.
ERROR 144 - REBOOT ?
I posted that the newer WD Green drives use 500MB platters and I meant 500GB. 500MB platters would make for a very physically-large 1TB+ hard drive!
I just got one of the 1TB 64mb WD drives that is known to be 4kb sector based.
Here is how it shows up in dmesg:
[ 3.420488] sd 1:0:0:0: [sdb] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB)
and here's what hdparm -I says:
ATA device, with non-removable media
Model Number: WDC WD10EARS-00Y5B1
Serial Number: WD-WCAV55227529
Firmware Revision: 80.00A80
Transport: Serial, SATA 1.0a, SATA II Extensions, SATA Rev 2.5, SATA Rev 2.6
Standards:
Supported: 8 7 6 5
Likely used: 8
Configuration:
Logical max current
cylinders 16383 16383
heads 16 16
sectors/track 63 63
--
CHS current addressable sectors: 16514064
LBA user addressable sectors: 268435455
LBA48 user addressable sectors: 1953525168
Logical/Physical Sector size: 512 bytes
device size with M = 1024*1024: 953869 MBytes
device size with M = 1000*1000: 1000204 MBytes (1000 GB)
cache/buffer size = unknown
Capabilities:
LBA, IORDY(can be disabled)
Queue depth: 32
Standby timer values: spec'd by Standard, with device specific minimum
R/W multiple sector transfer: Max = 16 Current = 1
Recommended acoustic management value: 128, current value: 254
DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6
Cycle time: min=120ns recommended=120ns
PIO: pio0 pio1 pio2 pio3 pio4
Cycle time: no flow control=120ns IORDY flow control=120ns
Commands/features:
Enabled Supported:
* SMART feature set
Security Mode feature set
* Power Management feature set
* Write cache
* Look-ahead
* Host Protected Area feature set
* WRITE_BUFFER command
* READ_B
My opinions are my own, and do not necessarily represent those of my employer.
I have a tiny 1.8" usb harddisk with 4096-byte sectors, and the Ubuntu installer crashes when it tries to read the partitioning information. Very annoying.
I suffer from attention surplus disorder.
We're adjusting our disklabel64 utility and kernel support to set the partition base offset such that it is physically aligned instead of slice-aligned, and we are using 32K alignment. That should fix the problem without having to mess around with fdisk.
The DragonFly 64-bit disklabel structure uses 64-bit byte offsets instead of sector addressing to specify everything. It ensures things are at least sector aligned but we wanted to make disk images more portable across devices with potentially different sector sizes. The HAMMER fs uses byte-granular addressing for the same reason, 16K aligned.
-Matt
There won't be a partition table with his suggestion. The boot sector set aside by the filesystem will be the very first sector of the disk.
I won't join Slashcott. OTOH, If Beta goes live, I just won't be back until it's fixed. Sorry Dice.
The article represents one data point, for one particular way to install a drive, on one (un-named) version of Gentoo, on one particular model of a WD drive that had a bugzilla entry entered by the author all of 2 days ago. So this is supposed to be an indictment of all of Linux?
The author even mentions that Ubuntu has an option on parted that accomplishes the task properly. I'd be much more interested in an article that talks about how the default installer handles this task rather than concentrating on one particular expert tool that does so. It's still good to know that fdisk on his un-named Gentoo distribution does the wrong thing.. but this hardly means we should fire up the klaxon and declare "Linux not fully prepared for 4096 sector hard drives!". It's certainly interesting, but I'll withhold judgment until we actually know more about the implications of this across the entire spectrum of Linux distributions and the various 4096 sector HDs.
AccountKiller
Don't partition the drive in XP - format the entire thing and don't split it apart. Get a secondary physical drive.
Still waiting on Serviscope_minor to wake up to fucking reality and realize that Jessica Price isn't going to fuck him.
I wouldn't be too fond of the MS development model from what I hear from those who were on the inside:
http://www.nytimes.com/2010/02/04/opinion/04brass.html?pagewanted=all
Inside Microsoft, political infighting trumps common sense. If you really want to hold up a closed source development model as an example of "what works" take a look at Apple. They crank out far better products with a fraction of the resources.
My rights don't need management.
It seems these drives need a new "don't lie to me, I can handle it" command, so OSes that don't have a problem with 4k size sectors can get the real info.
I didn't notice a performance drop in Linux when using such a 4k sector disk with misaligned partitions, but random stalls lasting up to about a minute. Some kernel threads (like md?_raid1, kdmflush) were in an uninterruptible sleep, and the hd led was continuously lit. It seemed like no data transfer happened at all. And after some time the disk seemed to work again as it should.
After aligning the partitions, by using 56 sectors per track, the disk seems to work flawlessly. Maybe it also works faster now, but I did not check it.
I have two WD Green drives that each have one big ext4 partition on them, and they appear to be nonaligned (the partition starts on sector 63). Is there a simple way to align the partition to the 4k sectors?
The affected drives are listed on Western Digital's site.
You shouldn't be using fdisk anyway; the MBR partition table is a dinosaur compared to GPT. Only Microsoft OSs still rely on the MBR, all others can handle GPT partitions without issue.
Its replacement (direct replacement), same interface, same feature set:
http://www.rodsbooks.com/gdisk/walkthrough.html
And yes, gdisk does handle sector alingment. It defaults to 4k-alignment on all drives larger than 800GB. The default can be changed, of course.
(I think that) It's probably misaligned. LVM uses a 192k sector size for it's metadata. See Theodore Ts'o's post for more information.
For the record: I love my job (tinkering with Linux) but I hate my customers (Linuz zealots who do nothing practical but cheerlead on the sides).
Screw you, little fucks.
---
no-name kernel hacker.
Posted by Aleksander Adamowski to this thread on the util-linux mailing list: http://thread.gmane.org/gmane.linux.utilities.util-linux-ng/2926 ... "So for any other owners of WD EARS drives, if these don't report
physical 4096-byte sectors to you, don't believe them and align your
partitions at the aforementioned sectors (a generally good idea is to
run the postmark benchark to compare performance on aligned and
non-aligned partitions).
Just in case anyone doesn't know how to align these partitions
(WARNING: the instructions below will likely destroy any data that's
on the given drive, only do this with drives you're intending to
erase):
# parted /dev/YOUR_DEVICE_NAME
(parted) mklabel gpt
# Here ^ I've chosen the GPT partition table format, but others may be
OK too - untested by me.
(parted) unit s
# Here ^ we're choosing sectors as units of measurement
(parted) mkpart primary ext2 40 -1
# Here ^ we're creating a partition that starts at sector 40, which is
divisible by 8.
# You can also try 48, 56, 64 and others - these should offer the same
high performance,
# but some space will go to waste - it's only some tiny kilobytes, though.
# Parted will likely complain about the end location of the ending sector:
Warning: You requested a partition from 40s to 2930277167s.
The closest location we can manage is 40s to 2930277134s.
Is this still acceptable to you?
Yes/No?
# Of course, we answer Yes.
(parted) quit
# After that, create a filesystem as usual, e.g: /dev/YOUR_DEVICE_NAME
# mkfs.ext4 -T largefile4
This should get the optimum performance from your 4 kB physical sector
drives even when they report 512 B sectors only to the OS."
I had this issue bite me in the ass when building an OpenFiler-based home NAS this week. The real kicker is that the drive label specifically states a need to jumper two pins on the drive and align your partitions when using WinXP, but "all other" operating systems are good-to-go with no intervention required. Maybe that would be the case if the drive weren't misreporting its sector size to the kernel, but I should've known better regardless.
I would be soooo embarrassed if it was me who wrote that...
It has been suggested that WD might internally offset block addresses by 1 so that LBA 63 maps to LBA 64. This way, Windows XP partitions would not really be misaligned. I performed a test that demonstrates that WD has not done this
Or you could read the label on the drive: "Windows XP, single partition: set jumpers 7-8 prior to installation".
For /dev/sdd, I used fdisk to add a Linux (0x83) primary partition, taking up the whole disk, using fdisk defaults. By default, the partition starts at LBA 63.
So... you're a technical writer, you're blatantly ignoring WD's recommended practices, and you're still blaming all of Linux based on the intentional misuse of one single tool? The author might have had a point if fdisk was the only tool to create partitions. The reality is that I don't know a single graphical installer that uses fdisk. He even acknowledges this by mentioning Ubuntu, but apparently did not care enough to do a little research into other installers:
openSuSE: Yast partitioning
Fedora: anaconda (libparted?)
Debian: partman (libparted)
Slackware: cfdisk (?)
So, to sum up: most mainstream distros use a tool from this century. The only places where you might find fdisk is in text-mode installers, and those are mostly used by skilled technical people (but also by bad article writers, apparently). Of course, I'm not saying that the libparted-based installers perform any better in this respect, but neither is he. However, that would be an article worthy to write.
Since this is one large file, and it can be written linearly to the disk, I expected that we would see a very slight performance hit. I think this is something that itself should be investigated. There's no reason for long contiguous writes to get hit this hard, and it's something that the kernel developers need to look into and fix.
How does the author know that it's a contiguous write? How full was the destination partition? What filesystem was used (extents-based, journalled)? What cache writeback mode was used? -1 for suggesting a kernel bug without giving enough detail to support that accusation.
Timothy Miller is a Ph.D. student at The Ohio State University, specializing in Computer Architecture, and Artificial Intelligence
I'm not used to looking for a hidden agenda among technical people... but I wonder what his angle is here.
fdisk an elegant tool for a more civilized age.. no wait. fdisk is antiquated and we only use it, because we are afraid to leave the msdos partition table behind out of the irrational fear some other software would stop working.
There are four flavors of 4096 byte-sectored drives:
4096 physical/logical - the bookkeeping parts of the file system cause read/modify write cycles because they are nearly always less than 4096 bytes, but the performance hit is relatively small; parted is badly broken. If they're less than 2TiB, then you can use an MBR, otherwise the kernel is broken for partition sizes.
4096 physical/512 logical; LBA 0 aligned "off by one" with physical block 0 - created to deal with stupid BIOS (and Win XP, where some drivers rely on it), mostly work fine with the default tools, but still have the bookkeeping issues. Because Win Vista/7 and OS X use GPT AND don't worry about "track" boundaries, they work better than Linux.
4096 physical/512 logical; LBA 0 aligned with physical block 0 - works great with Win Vista/7 and OS X, but the Linux installers are still aligning on the bogus track boundary, and not asking the physical/logical alignment. Performance, without some very smart tweaking by the person doing the formatting REALLY stinks.
4096 physical/512 logical, but are reporting 512 physical (usually aligned 0 for 0) - again to deal with BIOS/Win XP. Basically, treat ALL drives produced starting with 2010 as having 4K sectors, aligned 0 for 0, unless they explicitly report otherwise, and use the same human-intervention-required layout as above.
Currently, the tools are the most pressing issue, since they are really broken in this respect, but there are kernel issues, as well, with drives larger that 2TiB and 4096-byte sectors.
I had never ever heard of drive alignment until I bought an SSD.
Not to be unhelpfully pessimistic, but... didn't it even occur to people that drive alignment might be important? Is 4K drive alignment just Y2K for hard drives? Why did people only start thinking about this now?
I'll get to why in a second, but first:
RawCHS hasn't meant anything in a decade. The largest drive you can describe with CHS is 8GB.
Track size hasn't meant anything in even longer than that. When drives went to zone bit recording (ZBR), the number of sectors per track became variable. This happened in about 1989.
The sector size does mean something, but it is the actual sector size, not the sector "grouping" size. If the drive reported a sector size of 4K, then it would expect that the host understand that sectors are actually 4K in size, not 512B in size. But really no major OS supports this, they all expect 512B sectors. That's why these drives internally use one sector size and show another size to the host. And there is no way in the ATA specification for devices to indicate their internal sector size when they are presenting a different external sector size.
So this won't be fixed with a firmware update, unless Vista, 7 and every other major OS is fixed to actually support large sectors presented to the host. Then the drive could be firmware updated to report the large sector size to the host. And the drive would then be completely unusable under any earlier OS or with any USB or Fireware adapter.
http://lkml.org/lkml/2005/8/20/95
I forgot, there is one thing RawCHS nowadays. That is that there is no proper spec for how to know if a partition in an MBR (fdisk) partition table is a valid partition. So there are heuristics that are applied to the entries to guess if they are real or to be ignored as empty. One of the heuristics that some software uses is to ignore all partition entries that don't begin on a cylinder boundary. To be on a cylinder boundary, the partition has to start on a sector number that is a multiple of the number of sectors (S in CHS) in order to be valid. And since all drives 8GB or greater present an S of 63, that is why the first partition on an MBR disk has always started at sector 63, which makes it unaligned when the internal sector size is 4K (8 internal sectors).
Windows before 2000 checks the CHS alignment of MBR entries and ignores any partition entries that don't start on a multiple of S. So all disks out there are misaligned. With Windows 2000 or later, you can start the partition on any boundary you want.
Western Digital has a jumper you can put on the drive that adds 1 to all access requests, making all those misaligned first partitions aligned. But it'll also make any aligned partitions misaligned. So the real answer is just to layout your disk different. I would recommend using GUID disk partitioning instead of MBR anyway, because MBR doesn't work for >2TB drives. And GUID doesn't have any weird alignment requirements (and doesn't have any knowledge of CHS).
http://lkml.org/lkml/2005/8/20/95
192kB % 4kB == 0
So LVM/PV directly on a the full disk is fine for 4 kB disks as per original post, but will be misaligned for all SSDs (and USB flash).
SLC disks tend to have 128 kB erase block size, MLC 256 kB but some may have 1 MB. If you're really paranoid use some overkill alignment like 16 MB, it's not like you loose measurable amount of real space with it.
You can also run into the same issue with some RAID implementation, including Linux software RAID if you use superblock format 1.1 and 1.2, note that 1.1 was recently made the default! Good intentions (to avoid one specific issue) with bad results on at least SSD and possibly 4 kB disks too.
clearly WD's fault.
anyone aware of 4k-sector _desktop_ drives, that have a real 4k interface?
(some new samsung 2,5" and 1,8" hdds are made like that according to heise's ct (german magazine))
Cylinders, head, sector addressing really needs to go anyway. MBR has been hacked so many times over the years to support this method of formatting but in reality it's a total waste. Old operating systems that were forced to work with this method of addressing had to translate to and from the CHS format since sector allocation has ALWAYS been linear. Even FAT-12 used linear addressing methods.
CHS is also misleading and CHS optimizations are wasteful. Since all modern drives (starting with the first Connor IDE 20 meg drive in the early days) supported some form of intelligent sector remapping that would keep spare sectors available for relocating data after the magnetic medium of a heavily used sector elsewhere began to fail.
Sector remapping makes it so that CHS optimizations are entirely irrelevant since even brand new drives, straight off the production line ship with bad sectors that have been remapped elsewhere. For better drive performance, algorithms space out the spare sectors across the drive so that when accessing a spare sector, the head doesn't have to slam to the inner or outer rings of the disc. But still, CHS doesn't apply to absolute positions anymore.
This is 2010 now, I wrote file systems which functioned on 4096 byte sectors on ESDI drives back in the 80's. Made my drives much bigger doing it too. It's time that we move to larger sector sizes again. Modern ECC isn't that much better than 80's grade, however the processing power available to us is so much more that performing ECC on larger blocks of data is achievable. Also, using RAID-5, 5EE or RAID6 makes it so that we can depend less on single drive redundancy. SCSI and IDE should be extended so that controller can inform the drive of bad sectors it finds when performing RAID XORing.
Epitaph: At last! Root access!
It's a classic problem in computer marketing: don't make your product too good (or good enough).
Yep, right until the NEXT feature comes out that breaks the OS that says it can handle the truth. Now you need TWO "don't lie to me" commands. Repeat ad infinitum.
There is a jumper on the drive that you have to set to disable the 512 byte sector emulation.
All that needs to happen is a very minor update to the 'fdisk' utility to make it start partitions at 1MB. Microsoft saw this coming and changed the partitioner in Vista (and newer) to align on 1MB boundaries.
There don't have to be kernel tweaks. I've been doing this myself for years when I install Linux or Windows. Windows XP's 'diskpart' aligns by 32K, whereas the built-in format in setup from the CD aligns on 31.5K, so just format the drive from BartPE's diskpart utility first. In Linux you just drop into 'expert' mode in fdisk, hit 'x' and tell your partition to start at 2048 sectors (1MB).
Why 1MB instead of 32K or 64K? Because some RAID arrays have stripe sizes up to 1MB, and it just makes sense to 'waste' one meg in order to have alignment work on bounds of 4K, 16K, 32K, 64K, 128K, 256K, and 1MB (all common page or RAID stripe sizes).
"Sometimes, I think Trent just needs a cup of hot chocolate and a blankie." -Tori Amos on Nine Inch Nails
I just bought two of the WD Green 500GB drives to be used in a hardware RAID (Adaptec 2610SA, aka Dell CERC SATA1.5/6ch) on my Ubuntu-based server. I was going to format it in ext3. Will this problem affect me?
It seems these drives need a new "don't lie to me, I can handle it" command, so OSes that don't have a problem with 4k size sectors can get the real info.
Actually not a command, but a jumper. But yes, I don't understand either why the drives do not have such a setting.
This is not the first time drives have had backward compatibility problems, and on earlier occasions the drives usually had a jumper to set the drive in a compatibility mode - for example to clip the capacity at 32 GB when installing in an old computer which could not handle drives larger than 32 GB.