The Lies Disks and Their Drivers Tell
davecb writes "Pity the poor filesystem designer: they just want to know when their data is safe, but the disks and drivers try so hard to make I/O 'easy' that it ends up being stupidly hard. Marshall Kirk McKusick writes about the difficulties in making the systems work nicely together: 'In the real world, many of the drives targeted to the desktop market do not implement the NCQ specification. To ensure reliability, the system must either disable the write cache on the disk or issue a cache-flush request after every metadata update, log update (for journaling file systems), or fsync system call. Both of these techniques lead to noticeable performance degradation, so they are often disabled, putting file systems at risk if the power fails. Systems for which both speed and reliability are important should not use ATA disks. Rather, they should use drives that implement Fibre Channel, SCSI, or SATA with support for NCQ.'"
Don't assume that "enterprise" disks do this correctly either.
Many have options to make them behave properly but out of the box have write back caches and ignore FUA or similar, leading to the same problems.
2) The article's point on NCQ is that many consumer drives do not implement it correctly, and disable the write cache on the disk and issue cache-flush requests to increase performance, but leading to possible file-system failures if there is a power outage.
I think this article is saying that for the enterprise, buy enterprise drives, not consumer drives. Most consumers use laptops now, so power failure doesn't fit in, and consumers prefer speed over reliability, which is why I've always been stuck using laptops lacking ECC RAM.
Slashdot: Playing Favorites Since 1997
Native Command Queueing
Because not everybody knows everythingTM
systemd is Roko's Basilisk.
yeah, well, I have quite a bit of experience with samsung (not seagate branded but the older samsungs) drives.
they REPORTED having ncq but you always had to disable them.
I got so that I do this at bootup:
if [ -e /sys/block/sda/device/queue_depth ] ; then /sys/block/sda/device/queue_depth
echo " sda NCQ now off"
echo 1 >
fi
and so on.
performance does not suffer (that I would care about) BUT the data reliab was more than making up for it. no more timeouts, no more syslog 'scaries'.
vendors really do fuck up the protocol implementations. seagate is 'strange' in ways, so is WD, so is hitachi and ibm (I know they are not even in the biz anymore, at least for consumer drives).
windows has a 'blacklist' of what things to not use when talking to drives and so does linux. its a fact of life.
drive vendors are borderline idiots. sad but true ;(
--
"It is now safe to switch off your computer."
you'll see it in syslog!
timeouts, retries, even exiting the bus and doing full bus resets (which are slow and you'll NOT miss them).
as I posted before, older (5yr) samsungs were notorious for SAYING they support ncq but you would be foolish to let it just negotiate it and use it.
this was how things were in the very early days of 10/100 ethernet and full/half duplex. yes, the early models 'negotiated' duplex but many of them got it wrong and you'd have to manually set this on hubs/switches since you knew better than the equipment. there were even early NIC chips that worked better at 10meg ethernet than 100baseT! we would do ftp transfer tests and quite often a GOOD 10baseT was more reliable (over time) than 100baseT. the same happened to gig-e, too, in the early years.
--
"It is now safe to switch off your computer."
File systems need to be aware of the change to the underlying media and ensure that they adapt by always writing in multiples of the larger sector size. Historically, file systems were organized to store files smaller than 512 bytes in a single sector. With the change in disk technology, most file systems have avoided the slowdown of 512-byte writes by making 4,096 bytes the smallest allocation size. Thus, a file smaller than 512 bytes is now placed in a 4,096-byte block. The result of this change is that it takes up to eight times as much space to store a file system with predominantly small files. Since the average file size has been growing over the years, for a typical file system the switch to making 4,096 bytes the minimum allocation size has resulted in a 10- to 15-percent increase in required storage.
just to clarify what the author's point was:
The conclusion is that file systems need to be aware of the disk technology on which they are running to ensure that they can reliably deliver the semantics that they have promised. Users need to be aware of the constraints that different disk technology places on file systems and select a technology that will not result in poor performance for the type of file-system workload they will be using. Perhaps going forward they should just eschew those lying disks and switch to using flash-memory technology—unless, of course, the flash storage starts using the same cost-cutting tricks.
if you want to argue that, great, go nuts. nobody who actually RTFA thinks the argument is really about ncq. the ac you responded to said
the way I interpret TFA, the problem also applies to SATA drives which do not implement the NCQ specification.
well, here's what TFA actually said:
Luckily, SATA (serial ATA) has a new definition called NCQ (Native Command Queueing) that has a bit in the write command that tells the drive if it should report completion when media has been written or when cache has been hit. If the driver correctly sets this bit, then the disk will display the correct behavior.
In the real world, many of the drives targeted to the desktop market do not implement the NCQ specification. To ensure reliability, the system must either disable the write cache on the disk or issue a cache-flush request after every metadata update, log update (for journaling file systems), or fsync system call. Both of these techniques lead to noticeable performance degradation, so they are often disabled, putting file systems at risk if the power fails. Systems for which both speed and reliability are important should not use ATA disks. Rather, they should use drives that implement Fibre Channel, SCSI, or SATA with support for NCQ
i hope it's painfully obvious by now that the point about ncq is not that some drives don't have it; it's that some don't use it -- mostly so you don't go giving their drives bad reviews for being slow but unnoticeably reliable. if it's disabled, you can enable it. what sata drives don't have ncq? i asked wikipedia:
SATA revision 1.0 (SATA 1.5 Gbit/s) .... During the initial period after SATA 1.5 Gbit/s finalization, adapter and drive manufacturers used a "bridge chip" to convert existing PATA designs for use with the SATA interface. Bridged drives have a SATA connector, may include either or both kinds of power connectors, and, in general, perform identically to their PATA equivalents. Most lack support for some SATA-specific features such as NCQ. Native SATA products quickly eclipsed bridged products with the introduction of the second generation of SATA drives.
so yeah, probably not a whole lot of these drives being sold new, but there are lots of shops that buy used gear because it's cheap. these older sata drives haven't all just disappeared when revision 2.0 came out.
insensitive clod overlords obligatory xkcd car analogy russian reversals whoosh pedant fanbois ftfy in 3...2...1..PROFIT
Green drives from Seagate do not appear to have NCQ. As per below, I have 1 normal and 4 greens in this box:
/sys/block/sd?/device/queue_depth
/sys/block/sd?/device/queue_type
~$ cat
31
1
1
1
1
~$ cat
simple
none
none
none
none