Slashdot Mirror


The Lies Disks and Their Drivers Tell

davecb writes "Pity the poor filesystem designer: they just want to know when their data is safe, but the disks and drivers try so hard to make I/O 'easy' that it ends up being stupidly hard. Marshall Kirk McKusick writes about the difficulties in making the systems work nicely together: 'In the real world, many of the drives targeted to the desktop market do not implement the NCQ specification. To ensure reliability, the system must either disable the write cache on the disk or issue a cache-flush request after every metadata update, log update (for journaling file systems), or fsync system call. Both of these techniques lead to noticeable performance degradation, so they are often disabled, putting file systems at risk if the power fails. Systems for which both speed and reliability are important should not use ATA disks. Rather, they should use drives that implement Fibre Channel, SCSI, or SATA with support for NCQ.'"

192 comments

  1. almost clicked the link... by adturner · · Score: 5, Funny

    But you lost me the moment you mentioned ATA drives.

    1. Re:almost clicked the link... by Anonymous Coward · · Score: 1

      it's a bit difficult to parse, but the way I interpret TFA, the problem also applies to SATA drives which do not implement the NCQ specification.

    2. Re:almost clicked the link... by Lunix+Nutcase · · Score: 5, Insightful

      And yet fails to name any. Looking at Seagates site about NCQ pretty much every consumer model since 2004 has NCQ. This seems overblown.

    3. Re:almost clicked the link... by LordLimecat · · Score: 1

      Which is basically none of them. I would be astonished if anyone could link me a drive sold on newegg, amazon, or by Dell that does not implement NCQ when set to AHCI mode.

    4. Re:almost clicked the link... by h4rr4r · · Score: 3, Interesting

      I still bet those drives if you pull power on them will lose the data in their onboard caches.

      Which means they are lying about fsync.

    5. Re:almost clicked the link... by Lunix+Nutcase · · Score: 2, Insightful

      As weighty of an argument as your bet might seem to you, I'd refer actual evidence.

    6. Re:almost clicked the link... by h4rr4r · · Score: 1

      Try it.

      There are some decent tools out there to test it.

    7. Re:almost clicked the link... by MikeBabcock · · Score: 2

      If so, the article should link a proper study or basic attempt at surveying drives and how well they survive such behaviour instead of surmising.

      --
      - Michael T. Babcock (Yes, I blog)
    8. Re:almost clicked the link... by Anonymous Coward · · Score: 0

      The article was one T short of an AT-AT.

    9. Re:almost clicked the link... by Lunix+Nutcase · · Score: 2

      Ok. All my drives, which range in age of at least 4-5 years, support it and they are all the same models that Seagate lists support for. So once again, this sounds like overinflated sensationalism. If it was really such a problem he could have listed a few models to support his claim instead of nebulous handwaving, no?

    10. Re:almost clicked the link... by h4rr4r · · Score: 0

      They claim to support it.
      Have you tested them?

      http://brad.livejournal.com/2116715.html

    11. Re:almost clicked the link... by Anonymous Coward · · Score: 0

      Do you realize to Kirk McKusick is? If he says it is broken, you damn well know it is broken

    12. Re:almost clicked the link... by Lunix+Nutcase · · Score: 1

      Yeah, it's all a Seagate conspiracy to lie to me. Sorry, but A 7-year-old LJ post hardly has much weight considering NCQ didn't become common in consumer drives until late 2005/early 2006.

    13. Re:almost clicked the link... by Anonymous Coward · · Score: 0

      That's the problem. They SAY they support NCQ, when in reality they do it improperly or incompletely.
      The industry 's history is rich with examples of this sort of practice. The storage/hard disk industry in particular! Are you old enough to remember all of the issues with the first drives the supported the "Ultra DMA" ATA33 standard? There were whole lines of drives that would have inevitable corruption if you turned it on. Similar issues with some controller chips too.

    14. Re:almost clicked the link... by Lunix+Nutcase · · Score: 2

      Falllacious appeal to authority. I know who he is yet if it was as common as he claims he could do better than nebulous handwaving.

    15. Re:almost clicked the link... by Lunix+Nutcase · · Score: 1

      So it's claimed. Provide evidence by listing the models which do so rather than handwaving supposition.

    16. Re:almost clicked the link... by FranTaylor · · Score: 1

      They SAY they support it

      How can you really tell? It's well established that drives lie about their capabilities.

    17. Re:almost clicked the link... by FranTaylor · · Score: 1

      how can you actually tell that it's implemented properly, or at all? What if it's all a big lie, just like other parts of the protocol?

    18. Re:almost clicked the link... by Lunix+Nutcase · · Score: 1

      Because you have no evidence showing my drives don't whereas Seagate lying would be fraud? Prove the assertion rather than merely repeating it.

    19. Re:almost clicked the link... by TheGratefulNet · · Score: 4, Informative

      yeah, well, I have quite a bit of experience with samsung (not seagate branded but the older samsungs) drives.

      they REPORTED having ncq but you always had to disable them.

      I got so that I do this at bootup:

      if [ -e /sys/block/sda/device/queue_depth ] ; then
            echo " sda NCQ now off"
            echo 1 > /sys/block/sda/device/queue_depth
      fi

      and so on.

      performance does not suffer (that I would care about) BUT the data reliab was more than making up for it. no more timeouts, no more syslog 'scaries'.

      vendors really do fuck up the protocol implementations. seagate is 'strange' in ways, so is WD, so is hitachi and ibm (I know they are not even in the biz anymore, at least for consumer drives).

      windows has a 'blacklist' of what things to not use when talking to drives and so does linux. its a fact of life.

      drive vendors are borderline idiots. sad but true ;(

      --

      --
      "It is now safe to switch off your computer."
    20. Re:almost clicked the link... by fuzzytv · · Score: 1

      That is not the point - you'll loose data whenever there's a cache without a battery backup involved. The problem is that with some drives (good ones) you'll get at least a consistent filesystem (or easy to fix thanks to the journal), because the operations may be ordered somehow. The bad drives don't respect the ordering, making the corruption much more serious and potentially unfixable.

    21. Re:almost clicked the link... by TheGratefulNet · · Score: 3, Informative

      you'll see it in syslog!

      timeouts, retries, even exiting the bus and doing full bus resets (which are slow and you'll NOT miss them).

      as I posted before, older (5yr) samsungs were notorious for SAYING they support ncq but you would be foolish to let it just negotiate it and use it.

      this was how things were in the very early days of 10/100 ethernet and full/half duplex. yes, the early models 'negotiated' duplex but many of them got it wrong and you'd have to manually set this on hubs/switches since you knew better than the equipment. there were even early NIC chips that worked better at 10meg ethernet than 100baseT! we would do ftp transfer tests and quite often a GOOD 10baseT was more reliable (over time) than 100baseT. the same happened to gig-e, too, in the early years.

      --

      --
      "It is now safe to switch off your computer."
    22. Re:almost clicked the link... by sexconker · · Score: 0

      Which is basically none of them. I would be astonished if anyone could link me a drive sold on newegg, amazon, or by Dell that does not implement NCQ when set to AHCI mode.

      Nobody who cares uses AHCI. People who care go into the BIOS/UEFI and set their controller to RAID.
      Once that's done, you have no clue what the drive is fucking doing and what the Intel / Nvidia RAID controller ROM is doing, and what the corresponding driver is doing.

      NCQ? Caching? Hot-swapping? No one can reliably tell you wtf is going on.
      You can yank a drive and the add a drive, but when shit explodes (and it will!) you'll be left wondering who's at fault. The drive? The BIOS/UEFI? The RAID controller? The driver? The OS?

      If you trust a hard drive manufacturer on the face of things, I've got a wireless router to sell you. It has all these features for QoS and firewalling and it's secure and it gets 300 mbps and it's got the DLNA,I swear. See? It's on the box, it must be true. And I promise I'll keep updating the firmware and never push out a firmware update that removes features and prevents you from rolling back.

    23. Re:almost clicked the link... by AK+Marc · · Score: 1

      You are making the assertion they are wrong, yet are unwilling to support your position. Why do you only demand proof from the other side? The drive manufacturers have been shown to lie previously. So if you are claiming they aren't lying this time, perhaps the proof should be provided by you.

    24. Re:almost clicked the link... by jedidiah · · Score: 1

      Read it? Just did. Nothing concrete in there, just vague scare mongering. I am likely to get more useful information from the peanut gallery here. I've already seen one guy with an actual real world example.

      --
      A Pirate and a Puritan look the same on a balance sheet.
    25. Re:almost clicked the link... by Eponymous+Hero · · Score: 5, Informative
      you didn't bother to RTFA, good for you. it says quite plainly that (only part of) the problem is not drives that don't support ncq, but those drives that have it and disable it. and that was a relatively small portion of TFA. here's how the disks lie:

      File systems need to be aware of the change to the underlying media and ensure that they adapt by always writing in multiples of the larger sector size. Historically, file systems were organized to store files smaller than 512 bytes in a single sector. With the change in disk technology, most file systems have avoided the slowdown of 512-byte writes by making 4,096 bytes the smallest allocation size. Thus, a file smaller than 512 bytes is now placed in a 4,096-byte block. The result of this change is that it takes up to eight times as much space to store a file system with predominantly small files. Since the average file size has been growing over the years, for a typical file system the switch to making 4,096 bytes the minimum allocation size has resulted in a 10- to 15-percent increase in required storage.

      just to clarify what the author's point was:

      The conclusion is that file systems need to be aware of the disk technology on which they are running to ensure that they can reliably deliver the semantics that they have promised. Users need to be aware of the constraints that different disk technology places on file systems and select a technology that will not result in poor performance for the type of file-system workload they will be using. Perhaps going forward they should just eschew those lying disks and switch to using flash-memory technology—unless, of course, the flash storage starts using the same cost-cutting tricks.

      if you want to argue that, great, go nuts. nobody who actually RTFA thinks the argument is really about ncq. the ac you responded to said

      the way I interpret TFA, the problem also applies to SATA drives which do not implement the NCQ specification.

      well, here's what TFA actually said:

      Luckily, SATA (serial ATA) has a new definition called NCQ (Native Command Queueing) that has a bit in the write command that tells the drive if it should report completion when media has been written or when cache has been hit. If the driver correctly sets this bit, then the disk will display the correct behavior.

      In the real world, many of the drives targeted to the desktop market do not implement the NCQ specification. To ensure reliability, the system must either disable the write cache on the disk or issue a cache-flush request after every metadata update, log update (for journaling file systems), or fsync system call. Both of these techniques lead to noticeable performance degradation, so they are often disabled, putting file systems at risk if the power fails. Systems for which both speed and reliability are important should not use ATA disks. Rather, they should use drives that implement Fibre Channel, SCSI, or SATA with support for NCQ

      i hope it's painfully obvious by now that the point about ncq is not that some drives don't have it; it's that some don't use it -- mostly so you don't go giving their drives bad reviews for being slow but unnoticeably reliable. if it's disabled, you can enable it. what sata drives don't have ncq? i asked wikipedia:

      SATA revision 1.0 (SATA 1.5 Gbit/s) .... During the initial period after SATA 1.5 Gbit/s finalization, adapter and drive manufacturers used a "bridge chip" to convert existing PATA designs for use with the SATA interface. Bridged drives have a SATA connector, may include either or both kinds of power connectors, and, in general, perform identically to their PATA equivalents. Most lack support for some SATA-specific features such as NCQ. Native SATA products quickly eclipsed bridged products with the introduction of the second generation of SATA drives.

      so yeah, probably not a whole lot of these drives being sold new, but there are lots of shops that buy used gear because it's cheap. these older sata drives haven't all just disappeared when revision 2.0 came out.

      --
      insensitive clod overlords obligatory xkcd car analogy russian reversals whoosh pedant fanbois ftfy in 3...2...1..PROFIT
    26. Re:almost clicked the link... by LordLimecat · · Score: 1

      Except the OS, drive, and driver all claim that NCQ is working on every single SATA disk I have seen that has been set to AHCI. To say "yea but theyre still lying".... why shouldnt we ask for specifics? Are we expected to go out and test every extant drive to see whether it supports NCQ?

      If the author had specifics that he tested and found to improperly implement NCQ, perhaps he should have included his data so that it could be verified. All he gave was a general overview of tagged queuing and NCQ, and then declared "but not everyone does it right". Thats so vague as to be worthless.

    27. Re:almost clicked the link... by greg1104 · · Score: 2

      Intel's early SSDs such as the Intel X25-E were the last time I really got screwed by SATA drives that screwed this up very badly. See the PostgreSQL page on Reliable Writes for a lot more details on this subject.

    28. Re:almost clicked the link... by LordLimecat · · Score: 1

      I've got a wireless router to sell you. It has all these features for QoS and firewalling and it's secure and it gets 300 mbps and it's got the DLNA,I swear.

      You mean this thing? (after installing dd-wrt)

      On a serious note, exactly how is one supposed to purchase a drive if we cant trust anything on the product page? Just guess?

    29. Re:almost clicked the link... by hoggoth · · Score: 5, Insightful

      LOSE LOSE LOSE LOSE! YOU WILL LOSE DATA!

      Sorry... I'm usually a calm rational person. I almost never become a grammar-nazi, spelling nazi, or troll. It's just that I see this so often I'm afraid one day Webster will just give up and switch the definitions of Lose and Loose.

      --
      - For the complete works of Shakespeare: cat /dev/random (may take some time)
    30. Re:almost clicked the link... by Anonymous Coward · · Score: 0

      Oh CMD646, someday I will cross the rainbow bridge and get those lost 1990s files back.

    31. Re:almost clicked the link... by Anonymous Coward · · Score: 0

      Kingston 32G SSD causes hardware lockups and timeouts when queue depth exceeds about 4.

    32. Re:almost clicked the link... by anomaly256 · · Score: 4, Informative

      Green drives from Seagate do not appear to have NCQ. As per below, I have 1 normal and 4 greens in this box:

      ~$ cat /sys/block/sd?/device/queue_depth
      31
      1
      1
      1
      1

      ~$ cat /sys/block/sd?/device/queue_type
      simple
      none
      none
      none
      none

    33. Re:almost clicked the link... by anomaly256 · · Score: 1

      Btw, these are new drives, less than a year old. Manufactured November 2011

    34. Re:almost clicked the link... by anomaly256 · · Score: 1

      Further info if you want it:

      ~$ sudo hdparm -I /dev/sd[abcde] | egrep "(Native|Model)"
      Model Number: ST2000DM001-9YN164
      * Native Command Queueing (NCQ)
      Model Number: ST2000DL003-9VT166
      Model Number: ST2000DL003-9VT166
      Model Number: ST2000DL003-9VT166
      Model Number: ST2000DL003-9VT166

    35. Re:almost clicked the link... by ak3ldama · · Score: 2

      Given that we are talking about Kirk McKusick an appeal to authority is entirely fair. Just because he didn't have a bunch of citations or references listed at the bottom of the article does not mean they do not exist somewhere. For you to say it is a "fallacious" appeal to authority is unfair - it has not been proven as fallacious. (You assert it to be fallacious due to a lack of reference... the culture created by Wikipedia and all the "[Citation Needed]" slackers never fails to impress me.) Surely there exists blacklists in source in Linux/FreeBSD/other publicly viewable code, I also will not hold your hand and show you where.

      I have personally seen these kinds of issues (with writes not happening soon enough and fsync calls introduced for data integrity) with flash media which is something mentioned in the beginning of article. I would like to further comment that the article talked about other things such as sector size side effects and the impact on useful space. ++Great article. Does anyone else remember how he (Kirk McK.) used to sell shirts and pc stickers? I still have the bsd daemon logo sticker on the case of my first pc.

      --
      "but money is the God of Algiers & Mahomet their prophet." - Rich. O'Bryen June 8th 1786
    36. Re:almost clicked the link... by Anonymous Coward · · Score: 0

      Actually, I think he meant that your data will physically escape the confines of the drive and run rampant around the house. I loosed some data once, and I regret it to this day.

    37. Re:almost clicked the link... by Anonymous Coward · · Score: 0

      Funny. Have the same disks (four) on my NAS, and don't have the same issue:

      PESSOA> hdparm -I /dev/sd[abcde] | egrep "(Native|Model)"
                      Model Number: ST2000DL003-9VT166
                            * Native Command Queueing (NCQ)
                      Model Number: ST2000DL003-9VT166
                            * Native Command Queueing (NCQ)
                      Model Number: ST2000DL003-9VT166
                            * Native Command Queueing (NCQ) /dev/sde: No such device or address
                      Model Number: ST2000DL003-9VT166
                            * Native Command Queueing (NCQ)

    38. Re:almost clicked the link... by geekoid · · Score: 1

      All the data gets loose from the drive, that's why you lose it.

      "Webster will just give up and switch the definitions of Lose and Loose."
      and then...?

      --
      The Kruger Dunning explains most post on /. http://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect
    39. Re:almost clicked the link... by geekoid · · Score: 1

      http://www.amazon.com/Seagate-Barracuda-3-5-Inch-Internal-ST2000DL003/dp/B004CCS266/ref=sr_1_1?ie=UTF8&qid=1347056020&sr=8-1&keywords=ST2000DL003

      OF coutrse, you will need to buy it and tested it for yourself since you dismiss what other people who've tested say.

      --
      The Kruger Dunning explains most post on /. http://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect
    40. Re:almost clicked the link... by hoggoth · · Score: 4, Funny

      "and then..."

      and then all hell will break lose, obviously.

      --
      - For the complete works of Shakespeare: cat /dev/random (may take some time)
    41. Re:almost clicked the link... by causality · · Score: 0, Flamebait

      LOSE LOSE LOSE LOSE! YOU WILL LOSE DATA!

      Sorry... I'm usually a calm rational person. I almost never become a grammar-nazi, spelling nazi, or troll. It's just that I see this so often I'm afraid one day Webster will just give up and switch the definitions of Lose and Loose.

      It's a socially patterned form of mindlessness. I especially like to call it "an example of sheeple" because that word seems to really, really stick in some peoples' craw. Perhaps the whole sheep/shepherd analogy is too accurate for them to handle?

      At any rate ... five years ago I never saw anyone making that error. Then, one day, lots of people started doing it all at once. It's as though many tens of thousands of people all got together in a big smoky stadium and conspired to all do the same stupid thing.

      While I seriously doubt there was an actual formal conspiracy, it definitely is an example of following the crowd without critical thought (the domain of individuals) and mindlessly imitating what is seen. When people can't even come up with their own mistakes anymore, you know individuality is in trouble. The funny thing is, each of these people would swear (and probably pass a polygraph) that they are individuals. The most sincere belief in the world is useless if it's inconsistent with observed reality.

      So that's another item on the list of things people develop by means of monkey-see-monkey-do instead of observation and introspection. Previous entries on that list include personality, mannerisms, tastes/preferences, lack of situational awareness due to self-absorption, and the inability to drive a vehicle for five miles without repeatedly crossing over the double-yellow line.

      On the whole we certainly are a "fine" species.

      --
      It is a miracle that curiosity survives formal education. - Einstein
    42. Re:almost clicked the link... by drinkypoo · · Score: 1

      probably not a whole lot of these drives being sold new, but there are lots of shops that buy used gear because it's cheap

      We're talking about hard disks, the disks old enough to have a bridge chip are also puny by modern standards. It would be possible for those drives to sneak into workstations if you are buying them from some marginally reputable local PC builder but you're not going to see them come from any tier 1 manufacturer as they buy new stuff in lots. And you probably won't see them in your servers, most of which seem to be booting from SSD and carrying disks over 1TB each for storage.

      --
      "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
    43. Re:almost clicked the link... by causality · · Score: 1

      Given that we are talking about Kirk McKusick an appeal to authority is entirely fair. Just because he didn't have a bunch of citations or references listed at the bottom of the article does not mean they do not exist somewhere. For you to say it is a "fallacious" appeal to authority is unfair - it has not been proven as fallacious. (You assert it to be fallacious due to a lack of reference... the culture created by Wikipedia and all the "[Citation Needed]" slackers never fails to impress me.) Surely there exists blacklists in source in Linux/FreeBSD/other publicly viewable code, I also will not hold your hand and show you where.

      I have personally seen these kinds of issues (with writes not happening soon enough and fsync calls introduced for data integrity) with flash media which is something mentioned in the beginning of article. I would like to further comment that the article talked about other things such as sector size side effects and the impact on useful space. ++Great article. Does anyone else remember how he (Kirk McK.) used to sell shirts and pc stickers? I still have the bsd daemon logo sticker on the case of my first pc.

      I think GP has a good point. Sure, Kirk knows his shit. Sure, he really could be considered an authority. Yeah, I can just take his word for it with some confidence.

      But that doesn't help me to understand. It's just a memorized factoid. I just know that the statement has a boolean condition of "true". This does nothing to help me understand if *my* hardware is affected. Since just about everyone here has a hard drive (or severa), this would be useful information.

      I don't care if Kirk has a 16 inch penis and has never once been wrong about anything. I still want to know if my own drives are affected. In this entire thread, anytime someone asks that question or anything related to it, they receive hand-waving and " this isn't wikipedia" etc. That's cute, but we are asking for knowledge, and you'll never understand that if you make no effort to entertain what the other person is saying long enough to appreciate where they're coming from.

      --
      It is a miracle that curiosity survives formal education. - Einstein
    44. Re:almost clicked the link... by causality · · Score: 2

      Intel's early SSDs such as the Intel X25-E were the last time I really got screwed by SATA drives that screwed this up very badly. See the PostgreSQL page on Reliable Writes for a lot more details on this subject.

      This is why I am never an early adopter. If there were some tremendous emergency that only an early SSD could solve, and life-and-limb were on the line, I suppose I would take my chances. But I've never had that much of a need for an SSD.

      I suppose I have pioneers like you to thank, however, for helping to identify and work out the problems so that people like me who wait a little while have such a good experience. It's like volunteer work, except of course that you had to pay in order to do it.

      --
      It is a miracle that curiosity survives formal education. - Einstein
    45. Re:almost clicked the link... by Eponymous+Hero · · Score: 1

      what's puny? revision 1 drives had at least 250GB capacity. and while there are some shops that will buy this kind of crap, it doesn't have to be shops buying used, cheap gear. i recently bought a used server from a company for about $100; it has 6 250GB drives. a lot of laptops have that much or less storage space on their drives. you could get a macbook pro 4 years ago with only 160GB storage (i sold mine or i could give you some specs). i think you zoomed in on the one case that satisfies your disagreement and failed to see the bigger picture.

      --
      insensitive clod overlords obligatory xkcd car analogy russian reversals whoosh pedant fanbois ftfy in 3...2...1..PROFIT
    46. Re:almost clicked the link... by dbIII · · Score: 1

      I noticed that too - zero mistakes of that sort to many within the space of one year. I just wrote it off as a new bunch of posters became old enough to read this page and thus demonstrate a consequence of cutting education funding in the USA.
      It's depressing when an engineer that hasn't read much other than textbooks, journals, SF, comics and newspapers starts looking like an intellectual in comparison :(

    47. Re:almost clicked the link... by Hognoxious · · Score: 1

      I don't have any figures, this is just my impression, but they seem to go in clusters. You have the ongoing classics lose/loose they're/their/there but once in a while you see a new one, and then you see several examples of it.

      It's almost as if people who can't write are copying people who can't write.

      --
      Confucius say, "Find worm in apple - bad. Find half a worm - worse."
    48. Re:almost clicked the link... by Hognoxious · · Score: 1

      No. The person making the claim should state "I have a Foobar model X. I ran command 'wibble' and the result was 'herpderp'".

      And then anyone else who has the same disk can try it too, and ask if running 'yipyip' makes any difference, or ask if that applies to the model Y too...

      Specifics, or STFU.

      --
      Confucius say, "Find worm in apple - bad. Find half a worm - worse."
    49. Re:almost clicked the link... by Anonymous Coward · · Score: 0

      Coming out of highschool I feel the need to argue a valid point.
      Educational budget cuts are certainly a problem, yes, but I think people give it too much credit.
      The real problems we should be worrying about are things like the growing number of people who don't WANT to learn, and don't CARE, because they don't realize that the difference between "lose" and "loose" could be the difference between an $8 an hour job and a $12 an hour job.
      Due to the growing effectiveness of marketing, and the growing availability of content containing advertisements, it has also seemingly become even easier for mass media to effect corruptive societal brainwashing on our youth.
      Another big one on the list is the fact that kids have way too little responsibility now. A lot of us have it so easy that we aren't having to learn some of the absolute most basic real-world life skills until right up at the borderline, when we could have easily gained a lot of those skills when we were 10 or 11. The dearth of meaningful public transportation also keeps a lot of us from getting jobs, and by public transportation I honestly can't even swing so far as to say busses or subways, I'm talking about basic things like sidewalks and crosswalks; in my town you can't even walk from the mall to the Walmart even though they're RIGHT ACROSS THE ROAD FROM EACH OTHER, because in order to do so you have to run across the most dangerous road in the entire area, and there are NO CROSSWALKS. ...and don't even get my started on No Child Left Behind. The only thing that wreck of a government program did for me was slow down my education while simultatenously making my grades far worse than they could have been.

    50. Re:almost clicked the link... by Anonymous Coward · · Score: 1

      That is why I buy 5400 RPM drives. Data is much less likely to go flying off of that then the faster spinning 7200 RPM drives.

    51. Re:almost clicked the link... by dj245 · · Score: 1

      fatfingered the mouse when I modded this down accidentally, undoing it :(

      --
      Even those who arrange and design shrubberies are under considerable economic stress at this period in history.
    52. Re:almost clicked the link... by thegarbz · · Score: 1

      I have a 2TB Seagate Green drive, the cheapest 2TB drive I could find on the market about 8 months ago. Reports queue depth at 31 and type as simple, the same as all my cheap WD drives.

    53. Re:almost clicked the link... by skids · · Score: 1

      Two hypothesis: one is sort of as you said, it may be contagious as reading a bunch of such mistakes might preload the brain in such a way as to encourage making them.

      Another is this: I notice I sometimes make such mistakes when I post when I'm tired or distracted. Perhaps certain articles just attract more bleary-eyed posters, or posters whose attentions are divided.

    54. Re:almost clicked the link... by unitron · · Score: 1

      All the data gets loose from the drive, that's why you lose it...

      All the data will get loose from the drive, that's why you lose them.

      (singular is only correct if talking of one datum)

      --

      I see even classic Slashdot is now pretty much unusable on dial up anymore.

    55. Re:almost clicked the link... by unitron · · Score: 1

      "It's almost as if people who can't write are copying people who can't write."

      Why not? Those who write well learned to do so by reading others who wrote well. Although nowadays I should probably say that those who were able to write well learned to do so by having read others who wrote well.

      --

      I see even classic Slashdot is now pretty much unusable on dial up anymore.

    56. Re:almost clicked the link... by darkHanzz · · Score: 1

      Well, you could test. wipe the disk, write a known pattern, pull the plug, dump to screen the last byte that is written according to NCQ, re-plug, read.

    57. Re:almost clicked the link... by unitron · · Score: 1

      ...On a serious note, exactly how is one supposed to purchase a drive if we cant trust anything on the product page? Just guess?

      Welcome to the world of the "wishing to upgrade to more drive space" TiVo owner.

        : - (

      --

      I see even classic Slashdot is now pretty much unusable on dial up anymore.

    58. Re:almost clicked the link... by anomaly256 · · Score: 1

      That is funny indeed... Wonder wtf is up with that then.
      What firmware revision is on those drives? This is what I have:
      Firmware Revision: CC98 (on all 4 of the DLs)

    59. Re:almost clicked the link... by anomaly256 · · Score: 1

      What firmware version is on that drive? Maybe I need to update mine if possible: Firmware Revision: CC98

    60. Re:almost clicked the link... by Anonymous Coward · · Score: 0

      No I didn't RTFA, but let me point out the fallacies in your post instead.

      Historically, file systems were organized to store files smaller than 512 bytes in a single sector. With the change in disk technology, most file systems have avoided the slowdown of 512-byte writes by making 4,096 bytes the smallest allocation size.

      Indeed. This happened around the time Ext2 and FAT32 were introduced. There are no filesystems left in existence that default to less-than-4096-byte pages. As for small files, Reiserfs used to solve this by putting small file data in the metadata block (thus consuming no overhead at all), and I believe other filesystems can do the same.

      Since the average file size has been growing over the years, for a typical file system the switch to making 4,096 bytes the minimum allocation size has resulted in a 10- to 15-percent increase in required storage.

      We're already way past that. Some 10-15 years past, actually.

      The conclusion is that file systems need to be aware of the disk technology on which they are running to ensure that they can reliably deliver the semantics that they have promised.

      Wow. In the same vein, car vendors need to be aware of road and tire technology on which the cars are driving to ensure they can reliably deliver the driving experience that they have promised. News at 11.

      Perhaps going forward they should just eschew those lying disks and switch to using flash-memory technology—unless, of course, the flash storage starts using the same cost-cutting tricks.

      You mean, like all those ssd vendors that refuse to publish their erase block size, leaving users and operating systems in the dark as to how large batches they should issue?

      Luckily, SATA (serial ATA) has a new definition called NCQ (Native Command Queueing) that has a bit in the write command that tells the drive if it should report completion when media has been written or when cache has been hit. If the driver correctly sets this bit, then the disk will display the correct behavior.

      The problem with write queueing is completely orthogonal to the earlier stated problem of physical vs logical sector size. People that conflate the issues are either clueless or have an agenda. I suspect the latter.

      In the real world, many of the drives targeted to the desktop market do not implement the NCQ specification. To ensure reliability, the system must either disable the write cache on the disk or issue a cache-flush request after every metadata update, log update (for journaling file systems), or fsync system call. Both of these techniques lead to noticeable performance degradation, so they are often disabled, putting file systems at risk if the power fails. Systems for which both speed and reliability are important should not use ATA disks

      Systems for which both speed and reliability are important have never used ATA disks. Ask any server vendor.

      i hope it's painfully obvious by now that the point about ncq is not that some drives don't have it; it's that some don't use it

      Obvious? You haven't made a single attempt at proving that point. So, again: you need to show a study that shows that SATA drives advertise NCQ even though they do not implement it. Barring that, you should produce a list of drive production numbers for which you have personally verified that they don't, so that others can verify your claims.

      Bridged drives [..] lack support for some SATA-specific features such as NCQ.

      So? Which of these drives advertise NCQ support to the OS (hint: none, because the ATA command set does not include NCQ-like commands -- even if the drives were to advertise them, the bridge chip wouldn't be able to translate that into recognized ATA commands).

    61. Re:almost clicked the link... by fuzzytv · · Score: 1

      Yeah, sorry for that typo. I admit I was a bit drunk when writing that post and moreover - English is not my mother tongue. Try to write something in Czech and I'll have plenty of opportunities to grammar-nazi you ;-)

    62. Re:almost clicked the link... by Hognoxious · · Score: 1

      Although nowadays I should probably say that those who were able to write well learned to do so by having read others who wrote well.

      I'm sure it's not true that in the old days people, on the whole, wrote better. Many wouldn't have written at all. On the other hand, writing that masses of people were exposed to was almost certainly of a higher quality; anything poor wouldn't have been printed.

      If these errors spread like a disease, editors are the equivalent of quarantine.

      --
      Confucius say, "Find worm in apple - bad. Find half a worm - worse."
    63. Re:almost clicked the link... by Hognoxious · · Score: 1

      hdparm -I /dev/sd[abc] | egrep "(Native|Model)"
                      Model Number: WDC WD5000AACS-00ZUB0
                            * Native Command Queueing (NCQ)
                      Model Number: WDC WD5000AACS-00ZUB0
                            * Native Command Queueing (NCQ)
                      Model Number: ST2000DL003-9VT166
                            * Native Command Queueing (NCQ)

      --
      Confucius say, "Find worm in apple - bad. Find half a worm - worse."
    64. Re:almost clicked the link... by hoggoth · · Score: 1

      To je ve , co umím.

      --
      - For the complete works of Shakespeare: cat /dev/random (may take some time)
    65. Re:almost clicked the link... by Hognoxious · · Score: 1

      The article is how old?

      If A is the set of drives on sale then and B is the set of drives on sale now, what do you think the size of A ^ B is relative to A U B?

      --
      Confucius say, "Find worm in apple - bad. Find half a worm - worse."
    66. Re:almost clicked the link... by thegarbz · · Score: 1

      It's a ST2000DM001 with firmware rev CC4C, whatever that means. Why can't people use simple numbers these days.

    67. Re:almost clicked the link... by drinkypoo · · Score: 1

      what's puny? revision 1 drives had at least 250GB capacity.

      Right, that's puny. If you've got less than 1TB in a desktop disk or maybe 300GB or so in a 2.5" it's puny. Disks are 3TB now, remember? And 1TB is cheap even with the still-elevated prices, although not necessarily cheap and fast at the same time. But then there's SSD caching...

      --
      "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
    68. Re:almost clicked the link... by Meski · · Score: 1

      LOSE LOSE LOSE LOSE! YOU WILL LOSE DATA!

      Sorry... I'm usually a calm rational person. I almost never become a grammar-nazi, spelling nazi, or troll. It's just that I see this so often I'm afraid one day Webster will just give up and switch the definitions of Lose and Loose.

      Nine, it will never happen.

    69. Re:almost clicked the link... by Eponymous+Hero · · Score: 1

      you still don't get it. your first dumb mistake is assuming everyone buys the latest and greatest. you obviously didn't read anything past what you quoted because i gave two examples that contradict you. one is my own recent experience, and here's proof of the other. took a quick trip to newegg, looked up "laptop hard drive." right next to the 1TB drives that you think are the only ones that exist is this little puppy, 160GB http://www.newegg.com/Product/Product.aspx?Item=N82E16822148443. they do exist, and people still buy them.

      --
      insensitive clod overlords obligatory xkcd car analogy russian reversals whoosh pedant fanbois ftfy in 3...2...1..PROFIT
    70. Re:almost clicked the link... by drinkypoo · · Score: 1

      your first dumb mistake is assuming everyone buys the latest and greatest. you obviously didn't read anything past what you quoted

      That objection clearly applies best to your comment. You're blathering on about bullshit like used servers when I'm talking about OEMs and the vast bulk of computers. You're an edge case, and no one cares about you.

      --
      "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
    71. Re:almost clicked the link... by Eponymous+Hero · · Score: 1

      i think your talk about oems and "the vast bulk of computers" is blathering bullshit. got any more straw man arguments? if you only had a brain...

      --
      insensitive clod overlords obligatory xkcd car analogy russian reversals whoosh pedant fanbois ftfy in 3...2...1..PROFIT
  2. 2 out of 3 by ardmhacha · · Score: 4, Insightful

    Cheap, fast and reliable.

    Pick any two.

    1. Re:2 out of 3 by h4rr4r · · Score: 1

      Only because of market segmentation. They sell the same drives as Enterprise Grade SATA with these NCQ turned on in firmware as they do to consumers with it turned off.

      Even worse are the RAID controllers(looking at you DELL) that do not disable the cache on the drives when you tell them to disable the write cache. You think your data is safe, then you lose power and what should be an oops has you going to your backups and doing a rebuild or swapping over to a replicated box.

    2. Re:2 out of 3 by craigminah · · Score: 1

      I hear that in the project management realm...great quote and forces people to think about the interdependence of the three variables.

    3. Re:2 out of 3 by LordLimecat · · Score: 1

      They sell the same drives as Enterprise Grade SATA with these NCQ turned on in firmware as they do to consumers with it turned off.

      What you get with an "enterprise" sata drive is higher MTBF and a firmware tweaked to work well with RAID (desktop drives try to be more forgiving for IO errors, while the enterprise drives are more quick to decide "ive failed, let the raid controller do its work").

      Im not aware of any sata drive that doesnt support NCQ-- its certainly on every desktop drive ive used excepting MAYBE the very first sata drive I bought in 2003. Certainly all SSDs I am aware of (except niche super-low-end ones) and all mass-market desktop drives do.

    4. Re:2 out of 3 by FranTaylor · · Score: 1

      Even worse are the RAID controllers(looking at you DELL)

      You buy RAID controllers from DELL? You deserve what you get. Buying DELL server gear is like bringing a Schwinn Varsity to the Tour de France.

    5. Re:2 out of 3 by Anonymous Coward · · Score: 0

      Aha, the corollary to "Nobody ever got fired for buying IBM"...

    6. Re:2 out of 3 by geekoid · · Score: 1

      you need to define terms before I can even pick one.

      BTW, new manufacturing technique can accomplish all three.

      --
      The Kruger Dunning explains most post on /. http://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect
    7. Re:2 out of 3 by TheGoodNamesWereGone · · Score: 1

      Except they probably did. Ever hear of the "Deathstar" series?

    8. Re:2 out of 3 by smellotron · · Score: 1

      Ever hear of the "Deathstar" series?

      Heard of it? I remember living it in high school! I think at the time it was a 2.1GB drive and then the next size up. And "kibibytes" was still dog food, not an overindulgent unit of measurement. Ah, the days...

    9. Re:2 out of 3 by Pieroxy · · Score: 1

      I've got a few of them. They all died within 4 years. The first time I actually lost valuable data over a hdd failure: 80GB of all my stuff did go *poof*.

      Since then, I have a real backup system plus "hand made RAID" consisting of a crontabbed rsync every night btw my main drive and its twin.

    10. Re:2 out of 3 by Sigg3.net · · Score: 1

      So I take "cheap" and "fast and reliable".

      Thanks!

    11. Re:2 out of 3 by Anonymous Coward · · Score: 0

      new manufacturing technique can accomplish all three.

      Yeah, but that's offset by new management/sales techniques that accomplish only one

  3. really? by Anonymous Coward · · Score: 1

    I haven't seen a drive in at least a couple of years that didn't support NCQ. Is this really an issue? It sounds blown out of proportion.

    1. Re:really? by AK+Marc · · Score: 1

      You haven't seen a drive that didn't claim NCQ support. But did you test all of them you've seen?

  4. Performance degradation by Jerry+Smith · · Score: 1

    One can't have ones cake and eat it. Speed or reliability, there should be more differentiation and more clarity in the specs. I want my backup-disk to be very reliable, I want my boot-disk to be fast. Best performance for both, but different circumstances.

    --
    All those moments will be lost in time, like tears in rain. Time to die.
    1. Re:Performance degradation by h4rr4r · · Score: 1

      In that case boot should just be on an SSD, where these issues pretty much disappear anyway.

    2. Re:Performance degradation by profplump · · Score: 1

      That's not true at all. The X25 drives from Intel were terrible in terms of safe writes. The newer Intel drives are better, but only because they added a capacitor to allow in-process writes to complete -- simply being solid-state does not resolve these issues, and in some cases can make them much worse.

  5. Sorry, what? by Compaqt · · Score: 3, Insightful

    We're talking about ATA drives?

    As in non-SATA drives?

    Who has those anymore?

    While the article is good for publication in an academic journal like ACM, it's useless for the real world.

    For that, the author should tell us whether most drives on the market have NCQ already or not. Popular drives like WD Green and Seagate's various lines.

    Otherwise, saying "$A is useless without $Y" is pointless.

    --
    I'm not a lawyer, but I play one on the Internet. Blog
    1. Re:Sorry, what? by spongman · · Score: 1

      i'm guessing that since he's talking about 4K sectors, he means SATA since none of the PATA drives were large enough to warrant the switch from 512.

    2. Re:Sorry, what? by MikeBabcock · · Score: 2

      An SATA drive is a subset of ATA drives. You're thinking of PATA or IDE drives.

      http://en.wikipedia.org/wiki/Serial_ATA

      In other words, when someone says "ATA drives" they aren't exclusively talking about non-SATA drives.

      --
      - Michael T. Babcock (Yes, I blog)
    3. Re:Sorry, what? by Lunix+Nutcase · · Score: 1

      Wrong. ATA is the original name of what was renamed to PATA once SATA was introduced. So if he is saying what you claim he is using the term incorrectly.

    4. Re:Sorry, what? by abirdman · · Score: 1

      ATA came before SATA. One use I've found for ATA is to increase the number of drives supported on a motherboard. I use one as a boot disk for a FreeNAS box. The drive is basically read-only, so I don't expect write cache issues. ATA drives are very slow and noisy, and the reason that technology is obsolete.

      --
      Everything I've ever learned the hard way was based on a statistically invalid sample.
    5. Re:Sorry, what? by Anonymous Coward · · Score: 0

      It would never pass review.
      All the reviewers would say that ATA is dead.
      Hell, the reviewers even reject things about a current technology because it's on the way out (according to them).

    6. Re:Sorry, what? by TheGratefulNet · · Score: 1

      dude, the ata vs sata is ONLY on the controller card!

      the drive spindle is the same. its funny to hear someone say that older ide drives are 'noisier'.

      you CAN say that older drives are noisier than new ones. and I'd respond with "DUH!"

      but scsi, sata, ide, sas, fc: the drives are still the same. controllers are what varies.

      --

      --
      "It is now safe to switch off your computer."
    7. Re:Sorry, what? by AK+Marc · · Score: 1

      So, if someone is going to New York, and someone corrects them to "New Amsterdam", is the corrector correct in that the area was once called something different, or is "New York" the correct term, as that's the current name and eliminates confusion?

      ATA doesn't exist anymore. It's like saying you are going to New York by saying "I'm going to the USA". It might be technically correct, but entirely useless, especially if one is in California telling all his friends he's going to the USA. It's not only technically correct, but confusing, meaningless, and quite useless, just like saying ATA still means PATA. There are two ATAs now, SATA and PATA, and ATA means ATA, which SATA and PATA are both subsets of.

    8. Re:Sorry, what? by Compaqt · · Score: 1

      Regardless, the author's choice of terms plus lack of additional clarification totally muddled what he might have been trying to say.

      Also, there's no context for what he's saying ("SATA without NCQ is bad"). It's like saying MySQL without foreign keys is bad, without mentioning the context that MySQL does have foreign keys these days.

      --
      I'm not a lawyer, but I play one on the Internet. Blog
    9. Re:Sorry, what? by Anonymous Coward · · Score: 0

      SATA drives are still ATA drives, IDE (PATA) drivers are equally aswell ATA drives.

      I think the author means to say, manufactures LIE (which is in the summary). They claim NCQ but don't actually support it.

    10. Re:Sorry, what? by abirdman · · Score: 1

      Dude, thanks for the information. I did not know they were all the same. My noisy drive is a Seagate Bigfoot 20 gig drive that's around 15 years old, 5 1/4 format half-height that weighs five pounds. I can't believe I blamed the noise on the interface.

      --
      Everything I've ever learned the hard way was based on a statistically invalid sample.
    11. Re:Sorry, what? by AK+Marc · · Score: 1

      Perhaps more applicable if all builds of MySQL claimed to have foreign keys, but only some actually had them.

    12. Re:Sorry, what? by geekoid · · Score: 1

      No, YOU are wrong.
      AT refers to ATA and ATAPI command-set; which SATA uses; with improvements and new features.

      It was change to PATA to more accurately describe how it's moving data through it's channels i.e. parallel

      --
      The Kruger Dunning explains most post on /. http://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect
    13. Re:Sorry, what? by unitron · · Score: 1

      Are you sure it's new enough to be ATA?

      It's probably MFM or RLL instead of IDE.

      --

      I see even classic Slashdot is now pretty much unusable on dial up anymore.

    14. Re:Sorry, what? by MikeBabcock · · Score: 1

      While true in that "after the introduction of Serial ATA in 2003, the original ATA was renamed Parallel ATA, PATA for short" this leaves the term ATA ambiguous and non-specific.

      Of course, surrounded by geeks as I am, many will claim that this means the term doesn't exist or can't mean anything or must have the meaning it lost to PATA.

      I posit that it was renamed because ATA would become a confusing term without the 'parallel' specification added, and that its usage alone can obviously refer to either connection type. After all, SATA and PATA are linguistically both derived from ATA even if the standards have very little in common at a wire level.

      --
      - Michael T. Babcock (Yes, I blog)
  6. ATA drives...? WTF by poet · · Score: 3

    We shouldn't even be writing for ATA drives anymore. And any name brand manufacturer that you would trust (on a mediocre level) WD, Seagate etc... all support NCQ.

    --
    Get your PostgreSQL here: http://www.commandprompt.com/
    1. Re:ATA drives...? WTF by FranTaylor · · Score: 2

      Are you saying we should cast the ATA driver out of the kernel and dispose of all our ATA hardware?

      Even though it's not in new hardware any more, we still need to support it in existing hardware. The driver still needs work when the kernel APIs change.

    2. Re:ATA drives...? WTF by Impy+the+Impiuos+Imp · · Score: 1

      Servers would presumably be using the superior drives.

      What shitty home applications are relying on other than very occasional requirements to sync? Operating systems? Shut down, sure, a few things, but even that should be written to handle a loss and recover.

      --
      (-1: Post disagrees with my already-settled worldview) is not a valid mod option.
    3. Re:ATA drives...? WTF by poet · · Score: 1

      Good point. That said, ATA hardware is really quite old. I don't know that it would hurt to say, you know what if you want to run 3.6 of Linux you aren't going to have an ATA drive. If they run to run ancient hardware, let them run older hardware (note I didn't say ancient hardware).

      --
      Get your PostgreSQL here: http://www.commandprompt.com/
    4. Re:ATA drives...? WTF by Wolfrider · · Score: 1

      --Sorry, but that attitude is really rather stupid. I have an old(er - ~2005, 2GHz) laptop that has a 40GB IDE drive in it, running Linux kernel 3.x. I also have an ancient late-90's--early-2000's laptop @ 750MHz running Linux that has an IDE drive. Most P4-era hardware has IDE, some don't have SATA support on the motherboard AT ALL.

      --Bottom line: We can't drop support for IDE for at least the next 10-15 years. The drives are still being made*, and some of them last FOREVER.

      http://www.tigerdirect.com/applications/Category/guidedSearch.asp?CatId=8&sel=Detail%3B17_158_32310_32310

      --BTW, the Linux kernel still supports RLL and MFM, as well:

      http://cateee.net/lkddb/web-lkddb/IDE.html

      --
      .
      == WolfriderV6 == I'm willing to admit that *I just might* be wrong... Are you??
  7. I use zfs by the_humeister · · Score: 1

    I put my important files (pr0n, etc.) on my zfs mirror file server and scrub each week. The really important stuff (tax returns, etc.) I put in a safe deposit box at the bank.

    1. Re:I use zfs by Anonymous Coward · · Score: 0

      zfs triple mirror random read speeds are great for spinning disks too, and the writes are about equal to a single disk.
      Nothing like getting 500mb/s sequential read off consumer rust with 200% redundancy.

    2. Re:I use zfs by Pieroxy · · Score: 1

      I have a software RAID-5 array that gives me 500mbps sequencial read off of WDC Green 2TB drives. No need for zfs here.

  8. I work in the storage industry. by Anonymous Coward · · Score: 3, Informative

    Don't assume that "enterprise" disks do this correctly either.

    Many have options to make them behave properly but out of the box have write back caches and ignore FUA or similar, leading to the same problems.

    1. Re:I work in the storage industry. by hardwarefreak · · Score: 1

      Don't assume that "enterprise" disks do this correctly either.

      Those educated in enterprise storage assume it doesn't matter. This is a non issue with "enterprise" drives. Those willing to pay for them are attaching them to "enterprise" RAID controllers with [F/B]BWC. These controllers, whether PCIe or in a SAN head, disable the drives' NCQ/TCQ and onboard caches. The BBWC does write ordering negating NCQ/TCQ, and ensures resiliency which onboard drive caches cannot.

  9. Duh by rickb928 · · Score: 2

    I never recommended ATA drives for servers. Really old stuff that used MFM and RLL drives was back in the era where the just anything else. I used ATA drives for my home stuff and lab where it wasn't expected to be very reliable, and SCSI was all I used for a very long time. Even today I recommend against SATA though it seems tolerable, but SCSI drives are still my standard.

    Mostly I thought SCSI drives were also made better, but Seagate and WD convinced me otherwise.

    And yes, MFM drives in a Novell DCB setup were among my first servers. Making NW 2.15c mount a 4 GB volume just so you can say you did it would not be fun today, but back then it was work, and clients paid for it. I'm glad it wasn't a VINES server.

    --
    deleting the extra space after periods so i can stay relevant, yeah.
    1. Re:Duh by Anonymous Coward · · Score: 0

      Wow you really are a computer super hero.

    2. Re:Duh by FranTaylor · · Score: 1

      Funny, google has tens of thousands of servers and they put cheapo SATA drives in them.

    3. Re:Duh by rickb928 · · Score: 1

      Naw, just old.

      --
      deleting the extra space after periods so i can stay relevant, yeah.
    4. Re:Duh by rickb928 · · Score: 1

      And Google relies on multiply redundant servers and data, both for performance and reliability. Not many small businesses are gonna want to put in 5-way clustering.

      --
      deleting the extra space after periods so i can stay relevant, yeah.
    5. Re:Duh by Score+Whore · · Score: 1

      Not to mention Google doesn't have to provide a "right" answer. They can provide any answer that seems approximately correct. However for their accounting, payroll and tax systems I'd bet $20 that they use name brand servers running name brand OSes and name brand software.

    6. Re:Duh by petermgreen · · Score: 2

      And google is not your average company.

      Google has a LOT of servers running much the same workloads. As such it makes sense for them to put in the software engineering effort to achive higher level redundancy. They engineer things so they don't have to care if a server dies.

      Most companies have a relatively small number of servers each with a particular task. If one of those servers fails it's a much bigger deal that can mean significant downtime and/or data loss. IIRC restoring a big database from backup and then replaying logs onto it is not a fast process.

      --
      note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
    7. Re:Duh by rickb928 · · Score: 1

      If you take a few moments and look into it, google buys custom servers that are more like commodity boxes than premium servers.

      --
      deleting the extra space after periods so i can stay relevant, yeah.
    8. Re:Duh by Anonymous Coward · · Score: 0

      Yep the server is called pen and paper. Works quite well. The OS that handles Pen and Paper is called an Accounting Clerk for the home version while the Pro is Called a CPU who uses Accounting Clerks to perform his basic function.

    9. Re:Duh by Anonymous Coward · · Score: 0

      > the Pro is Called a CPU

      Fail.

    10. Re:Duh by Anonymous Coward · · Score: 0

      That's actually pretty hard core. Props, (or whatever you kids say now days)

    11. Re:Duh by Score+Whore · · Score: 1

      Yes, for their search farm they use custom, legacy free, bare bones systems. Because it doesn't matter if half a rack explodes at once and they lose 1% of their index. As long as they can deliver a result to search queries, they're golden. There's no right answer to a search, good enough is good enough.

      It's tough to find cites now, but back in the beginning Schmidt made them buy of the shelf software for accounting and the like. Because it does matter if you get the right answer.

    12. Re:Duh by Anonymous Coward · · Score: 0

      Hah! Now I know you're a poseur. Real sysops used ESDI drives. And they loved the bindery! And NetBEUI for WfW interop! Uphill, in the snow! Both ways!

    13. Re:Duh by rickb928 · · Score: 1

      Why ESDI when you could find VLB adapters? Or just use LANtastic and live it large? Or really push the paradigm and put up a Token-Ring to Arcnet bridge to reach the workstation way back in tge sawmill?

      I for one didn't miss the bindery though. And knowing Token-Ring also made me some good money.

      --
      deleting the extra space after periods so i can stay relevant, yeah.
  10. O_Direct Works Quite Nicely by Anonymous Coward · · Score: 0

    Opening a file with attribute O_DIRECT seems to work quite nicely in bypassing the pagefile caching system, and getting the data to the disk in a timely fashion.

    1. Re:O_Direct Works Quite Nicely by greg1104 · · Score: 1

      Except on the many Linux versions where O_DIRECT doesn't work properly. I have kernels where it works as expected; ones where it quietly fails to sync to disk; and ones where using it causes a PANIC. It's never been a priority for that API to function correctly given that Linus thinks direct IO is totally braindamaged.

    2. Re:O_Direct Works Quite Nicely by Anonymous Coward · · Score: 0

      Except on the many Linux versions where O_DIRECT doesn't work properly. I have kernels where it works as expected; ones where it quietly fails to sync to disk; and ones where using it causes a PANIC. It's never been a priority for that API to function correctly given that Linus thinks direct IO is totally braindamaged.

      OMFG I missed that.

      Another reason to move Linux away from being a real enterprise OS.

      Why?

      Because Linus is completely WRONG about there never being a reason for O_DIRECT.

      If you ARE doing synchronous writes, why bother moving the data from userland memory, to kernel memory, and then to disk? The app's going to block anyway, so why do the extra copy? How about when you're reading or writing a few hundred gigabytes or more just once? Why cache that? It uses more memory just to slow things down. Yay, wonderful.

      The only reason a page cache is useful is for coalescing small writes, or caching data that will be read again.

      If you're doing large writes, or know you're not going to read that data again, the cache is wasted time and memory.

  11. ATA by Anonymous Coward · · Score: 1

    Systems for which both speed and reliability are important should not use ATA disks.

    Ok, I'll keep that in mind next time I buy ATA disks.

  12. Acronyms by puddingebola · · Score: 1

    Implementing the NCQ specification in nonhierarchical file system can easily be accomplished by passing an FMGH array through an EMH converter, while maintaining the NCQ specification via a THGN override. All NCQ specification still conform to the YTUR standard established in 1987 at the CMSD conference in Barcelona. If that helps at all.

    1. Re:Acronyms by Anonymous Coward · · Score: 0

      You forgot to reroute warp power to the deflector dish!

    2. Re:Acronyms by RoverDaddy · · Score: 1

      This always sounds to me like it would be equivalent to saying in this world: 'Reroute 120V from the mains to your TV antenna', and the results would be about as useful as one would expect. Seriously, was the deflector dish -designed- to accept warp power before the insane crew of the Enterprise came up with the idea?

      --
      RETURN without GOSUB in line 1050
    3. Re:Acronyms by RobertLTux · · Score: 1

      well given that the deflector dish willbe/was a huge most likely METAL dish then while it may not have been designed to take power from the warp drive it most likely could (just like you could wire your home power into your antenna).

      --
      Any person using FTFY or editing my postings agrees to a US$50.00 charge
    4. Re:Acronyms by fast+turtle · · Score: 1

      actually, the deflector dish is designed to take power from the warp drive. Otherwise you can't extend shields to protect that other ship or station when you need to. Thus it wasn't a crazy idea on the part of the engineers to divert power to the deflector dish though it should have been divert power to the navigational deflector as that's what the dish is.

      --
      Mod me up/Mod me down: I wont frown as I've no crown
  13. Not about ATA, about enterprise data storage by MSTCrow5429 · · Score: 4, Informative
    1) This article isn't about ATA, ignore it.

    2) The article's point on NCQ is that many consumer drives do not implement it correctly, and disable the write cache on the disk and issue cache-flush requests to increase performance, but leading to possible file-system failures if there is a power outage.

    I think this article is saying that for the enterprise, buy enterprise drives, not consumer drives. Most consumers use laptops now, so power failure doesn't fit in, and consumers prefer speed over reliability, which is why I've always been stuck using laptops lacking ECC RAM.

    --
    Slashdot: Playing Favorites Since 1997
    1. Re:Not about ATA, about enterprise data storage by danomac · · Score: 1

      When the power goes out, all cards are in the air anyway. We had a UPS boo-boo and our enterprise drives (both SCSI & SAS) managed to corrupt data, even with a battery on the controller itself (battery was in good health.)

      Shit happens. It's pretty damn difficult to account for power failures... even with battery backups on the local controllers you can only do so much.

    2. Re:Not about ATA, about enterprise data storage by MSTCrow5429 · · Score: 2

      Windows 7's Device Manager, there is a Policies tab, allowing you to "Enable write caching on the device" and additionally to "Turn off Windows write-cache buffer flushing on the device." The former warns "a power outage or equipment failure might result in data loss or corruption." The latter states "do no select this check box unless the device has a separate power supply that allows the device to flush its buffer in case of power failure." In Windows 7, by default, write-caching is on, and write-cache buffer flush is off. It does note that not all drives allow you to change these settings, possibly indicating that the article's author recommends any modern drive that allows one to manually choose reliability over performance. The major issue with both is that data may reside in primary memory and has not been written to the drive, there's a power failure, and your data disappears.

      --
      Slashdot: Playing Favorites Since 1997
    3. Re:Not about ATA, about enterprise data storage by FranTaylor · · Score: 1

      Or you can buy a real RAID controller with battery backup for the cache, in which case you are just fine with the cheap SATA drives.

    4. Re:Not about ATA, about enterprise data storage by Anonymous Coward · · Score: 0

      But if your filesystem is sensible, then so long as it's not being lied to about what's actually made it to disk, your data should not be inconsistent.

    5. Re:Not about ATA, about enterprise data storage by ChumpusRex2003 · · Score: 5, Informative

      The "Turn off Windows write-cache buffer flushing on the device" option activates an ancient windows bug, and should never be used.

      When Windows 3.11 was released, MS accidentally introduced a bug, whereby a call to "sync" (or whatever the windows equivalent was called) would usually be silently dropped. At the time, a few programmers noticed that their file I/O appeared to have improved, and attributed this to MS's much marketed new 32-bit I/O layer. What a lot of naive developers didn't notice was that the reason their I/O appeared to be faster was that the OS was handling file steams in an aggressive write-back mode, and then calls to "sync" were being ignored by the OS.

      Because of this, there was a profusion of office software, in particular, accounting software, which would "sync" frequently - some packages would call "sync" on every keypress, or everytime enter was pressed, or the cursor moved to the next data entry field. As on 3.11, this call was effectively a NOP, a lot of packages made it onto client machines, and because it was fast, no one noticed.

      With Win95, MS fixed the bug. Suddenly, corporate offices around the world had their accounting software reduced to glacial speed, and tech support departments at software vendors rapidly went into panic mode. Customers were blaming MS, Win95 was getting slated, lawyers were starting to drool, etc. Developers were calling senators and planning anti-trust actions. The whole thing was getting totally out of hand.

      In the end, MS decided the only way to deal with this bad PR, was to put an option into windows, where the bug could be reproduced for software which depended upon it. The option to activate the bug was hidden away reasonably well, in order to stop most people from turning it on, and running their file-system in a grossly unstable mode. However, in Win95 - Vista, it had a rather cryptic name "Advanced performance", which meant that a lot of hardware enthusiasts would switch it on, in order to improve performance, without any clear idea of what it did. At least in Win7 it now has a clear name, even though it still doesn't make clear that it should only be used for when using defective software.

    6. Re:Not about ATA, about enterprise data storage by Anonymous Coward · · Score: 0

      wtf, if it's only necessary for broken software, why is it not an attribute of the executable file that can be set rather than a global setting?

  14. Is this a real problem? by Anonymous Coward · · Score: 0

    The thrust of the article seems to be that desktop-market SATA drives don't support native command queuing and that means filesystems can't guarantee integrity right before a power failure. That sounds a little out-of-date to me, I thought most SATA drives supported NCQ these days. A quick unscientific skim through the top three desktop drive manufacturers suggests this is true:

    Seagate website:
    "Since late 2004, most new SATA drive families have supported NCQ"

    Western Digital Website does not make a similar statement but it appears that at least the "green" and "black" lines of desktop drive support NCQ meaning most if not all of their popular drives

    Hitachi does not make statements on their website but searching product descriptions shows that at least their most popular "deskstar" line supports NCQ

    Which would suggest that only a very small population of old or ultracheap hard drives are affected.

    1. Re:Is this a real problem? by FranTaylor · · Score: 1

      They might SAY they support it, but HOW CAN YOU REALLY TELL?

      We all know that hardware LIES all the time about its ACTUAL capabilities, just READ the article!

    2. Re:Is this a real problem? by Lunix+Nutcase · · Score: 1

      They do despite the people parroting his words without being able to back up the statements beyond a fallacious appeal to authority.

  15. NCQ - Native Command Queueing by wonkey_monkey · · Score: 4, Informative

    Native Command Queueing

    Because not everybody knows everythingTM

    --
    systemd is Roko's Basilisk.
    1. Re:NCQ - Native Command Queueing by Anonymous Coward · · Score: 0

      Really? a score of "5" for a link to wikipedia?
      Slashdot really has lowered its standards...

      no offense to the poster.

    2. Re:NCQ - Native Command Queueing by unitron · · Score: 1

      It's not about linking to Wikipedia, or anywhere else.

      It's about knowing to which thing to link, and when the situation calls for it.

      --

      I see even classic Slashdot is now pretty much unusable on dial up anymore.

  16. Get Hardware RAID by FranTaylor · · Score: 4, Insightful

    The people who make hardware RAID know all about the lying drives, they get good information from the manufacturer on how to make the drives play nice with the RAID controller.

    Just read the compatibility charts for your RAID controller, many drives have footnotes with minimum drive firmware requirements and other odd behavior.

    1. Re:Get Hardware RAID by Anonymous Coward · · Score: 1

      Then test your RAID controller since many of them don't actually check parity on read. They only use the parity bits to reconstruct the data if one of the devices fail. That would be fine if consumer drives always failed catastrophically. Alas, SATA drives can fail in ways that cause corrupted reads.

      Test is simple enough: write known data to the RAID, shutdown, remove a disk and use dd to corrupt the data, re-install, power up, read data and check. In many cases the raid will happily return the bogus data.

    2. Re:Get Hardware RAID by randallman · · Score: 3, Interesting

      The only real advantage to "Hardware RAID" is the battery backed cache. Hardware RAID comes with the disadvantage of a whole other operating system "firmware" with its own bugs and often proprietary disk layout. Parity calculations are nothing for current CPUs, so the onboard processor is not so useful. Advanced filesystems such as ZFS or BTRFS need direct access to the disks. I'd like to see drives and/or controllers with battery backed cache. Until then, I rely on my UPS.

    3. Re:Get Hardware RAID by hardwarefreak · · Score: 1

      The only real advantage to "Hardware RAID" is the battery backed cache.

      Hardware RAID has many advantages. Persistent cache, while important to performance, is but one. Far better management infrastructure is another. Many RAID vendors offer a single web management console which can control all RAID devices across a network from a single console. Try that with mdadm. Then you have superior alerting and monitoring, etc. Most RAID vendors have had excellent easy to setup/use snmp capability for over a decade. mdadm is still lacking here as is the inbuilt Microsoft RAID (does anyone actually use it?).

      Hardware RAID comes with the disadvantage of a whole other operating system "firmware" with its own bugs and often proprietary disk layout.

      All hardware comes with firmware, even the SATA controller and NIC on your consumer mobo, and everyone has bugs to fix on occasion, including software RAID. This is why a good administrator reads release notes. Also note that most hardware RAID controller (PCIe card) vendors have been moving to the SNIA on disk layout metadata standards. That said, you won't find me swapping out an LSI RAID with an Adaptec, or with software RAID any time soon, simply because they all use the same metadata format and thus it should "just work". That's just not smart due to all other kinds of issues.

      Parity calculations are nothing for current CPUs, so the onboard processor is not so useful.

      Spoken as I'd expect from an individual with no real hardware RAID experience/knowledge. Parity work is a tiny fraction of the operations peformed by a RAID ASIC. And in fact most enterprises don't even use parity RAID due to the huge performance penalty of RMW and the unacceptable rebuild times of parity arrays. The bulk of the work done by a RAID ASIC today is IO request processing and cache management. So no, it doesn't matter on what chip XOR calculation are performed, because those with real workloads aren't using parity RAID. If you're using Linux mdraid your parity calculations are limited to a single core per array, so if one must use parity RAID they're likely better off with a good dual core RAID card.

      Advanced filesystems such as ZFS or BTRFS need direct access to the disks.

      You really need to educate yourself. Oracle sells hardware SAN RAID arrays. ZFS doesn't have direct disk access with these.
      http://www.oracle.com/us/products/servers-storage/storage/san/pillar/pillar-axiom-600/features/index.html

      I'd like to see drives and/or controllers with battery backed cache. Until then, I rely on my UPS.

      A UPS is not a substitute for persistent RAID cache. Persistent cache saves you from kernel panics and other crash scenarios that could corrupt your filesystem journal and/or filesystem proper, as well as saves you from power outage. A UPS only saves you from power outage.

      Stop regurgitating the misinformation you read on the Wikipedia RAID page. Expend some effort and do your own research. Just about everything you've stated here is incorrect. In fact, don't do any research. Just simply keep quiet since you obviously don't use RAID and have no experience with it.

    4. Re:Get Hardware RAID by drsmithy · · Score: 1

      Spoken as I'd expect from an individual with no real hardware RAID experience/knowledge. Parity work is a tiny fraction of the operations peformed by a RAID ASIC. And in fact most enterprises don't even use parity RAID due to the huge performance penalty of RMW and the unacceptable rebuild times of parity arrays.

      Rubbish. The default and recommended RAID schemes for two of the biggest storage vendors on the planet (EMC and NetApp) are both parity RAID.

      Indeed, with the rise of SSDs (and their relatively small sizes) nearly eliminating the performance penalty of parity RAID schemes, expect to see its usage grow, not shrink.

    5. Re:Get Hardware RAID by hardwarefreak · · Score: 1

      Rubbish. The default and recommended RAID schemes for two of the biggest storage vendors on the planet (EMC and NetApp) are both parity RAID.

      You're failing to recognize a key characteristic of EMC/NetApp arrays: persistent cache. SAN heads that have 8GB to 512GB of persistent cache that acks to fsync can certainly hide much of the RMW latency from transactional applications, and to a degree, the long rebuild times of their 4-8 drive parity array building blocks. EMC and NetApp arrays have massive quantities of such cache. As do the likes of the other SAN heavy hitters, IBM, SGI, HP, Oracle, etc.

      Do note however that many organizations using the enterprise SAN heads with large parity RAID pools for generic bulk storage do often create separate RAID10 arrays within the unit for their high transaction rate applications, i.e. POS/CRM/BI databases, mail spools and mailboxes, etc.

      And when you come down out of the stratosphere to the midrange SAN heads and then HBA RAID controllers, your persistent cache size is typically 4GB for SAN heads and 512MB for HBAs. With these systems a parity rebuild significantly degrades application performance, and during normal operation with a heavy random IOPS transactional workload RMW latency will as well. And with software RAID you don't have any persistent cache, RMW is constant, and rebuilds bog the entire system down. RAID10 or striped/concatenated mirror pairs, depending on the workload, are a much better option for these 3 cases.

      Indeed, with the rise of SSDs (and their relatively small sizes) nearly eliminating the performance penalty of parity RAID schemes, expect to see its usage grow, not shrink.

      It absolutely will grow, but it won't entirely displace rust. And yes, SSD latency/bandwidth do eliminate most of the performance problems with parity RAID on rust. Though the current crop of controller silicon isn't fast enough to fully take advantage of SSDs. Take your big EMC and NetApp for example. If one were to allow the controller to use up to 100% of its resources to rebuild a RAID6 array of 8 SSDs for the fastest possible rebuild time, the operation would eat 100% of the controller's cycles and other IO would suffer. With an 8 disk RAID6 rust array, it would have sufficient excess capacity to service other IO in a timely manner. To fully take advantage of SSDs in RAID, we need much faster silicon. Almost any number of certified SSDs in RAID5/6 will saturate the dual core ASIC in LSI's top RAID HBA as its parity engine can't keep up with the IO rate.

  17. The solution is to use logging and hashes by Anonymous Coward · · Score: 1

    You can avoid the need for NCQ if you use a log-structure and protect references with strong checksums. In that way you will know after a crash if say a child tree node referenced is what the referencing parent thinks it should be, and you can use double-buffering or logging to roll back to a known good state. I believe ZFS does this, as does the experimental Lithium distributed file system developed by VMware. Don't bother with NCQ.

    1. Re:The solution is to use logging and hashes by Score+Whore · · Score: 1

      I don't know what you are smoking but NCQ isn't an acronym for "caching". It's native command queuing and what it does is allow your OS to have multiple commands inflight simultaneously. Which is a big deal and very helpful.

    2. Re:The solution is to use logging and hashes by Anonymous Coward · · Score: 0

      I never said it was. NCQ will let you have many writes in flight, without waiting for dependent writes to complete. If you do that without NCQ (or other smarts) you risk corrupting your data structures. However, if you use strong checksums you can detect corrupted data structures, and if you use some form of CoW (like logging) updates you can roll back to a known good state.

  18. how about a utility or SMART patch by Anomalyst · · Score: 1

    That would test and identify a drive for NCQ and cache disable/enable operation correctness that would report the model/serial and result to a central website

    --
    There is no right to feel safe thru security vaudeville at the expense of everyone's freedom, privacy and tax money.
    1. Re:how about a utility or SMART patch by greg1104 · · Score: 1

      Whether this sort of thing works correctly can change based on drive firmware. So even a given model/serial number combination can change which type of results it gives over time. There is no substitute for testing yourself.

  19. Linus's Input on Write Cache by randallman · · Score: 3, Interesting

    I think this is quite interesting.

    http://yarchive.net/comp/linux/drive_caches.html

    While I've often gotten the impression that the write cache opens up a large "write hole", Linus says that data is cached only for milliseconds, not held in the cache for several seconds. Still, I'd like to see battery backed caches in regular drives and/or controllers.

    Would be nice to hear from some drive firmware writers.

    1. Re:Linus's Input on Write Cache by fa2k · · Score: 1

      I can't really see a huge benefit to battery-backed caches. Scattered writes can be aggregated in RAM to make them sequential, and this is easier for filesystems with copy-on-write. As long as the drives implement NCQ correctly, the filesystem can arrange the writes such that it remains in a consistent state. The throughput is limited by the drives anyway, so it shouldn't matter if the writes are scheduled in the drive's controller or in software.

      For synchronous writes you can't buffer in RAM, but software shouldn't be calling fsync() a lot anyway. NFS most certainly does, though. There are filesystems which allow you to have a separate device for synchronous writes, for example journal devices in XFS and log devices in ZFS.

  20. Does everyone here think that ATA = PATA? by Anonymous Coward · · Score: 0

    Are you even real nerds? What's up with you Slashdot??

  21. lose the kraken! by Anonymous Coward · · Score: 1

    hey GN, don't loose your cool when you see someone play lose with grammer.

    1. Re:lose the kraken! by fustakrakich · · Score: 1

      It's not 'löschen das kraken'?

      --
      “He’s not deformed, he’s just drunk!”
  22. It's cool that a US Marshall is doing this stuff. by Anonymous Coward · · Score: 0

    Way to go, Kirk McKusick!

    And does this mean I have a shot at being a US Marshall?

  23. Read the solution here .. by dgharmon · · Score: 1

    Put some flash ram on the HD with its own on-board battery backup ...

    --
    AccountKiller
  24. ATA? by erc · · Score: 1

    ATA? Does anyone use that anymore? Hasn't the world gone to SATA, FC, or SCSI-? This seems a lot of ado about nothing...

    --
    -- Ed Carp, N7EKG erc@pobox.com PGP KeyID: 0x0BD32C9B What I'm up to: http://intuitives.mine.nu
    1. Re:ATA? by Anonymous Coward · · Score: 0

      FC is not a drive it is an interface to a drive array(s) using SAS )formally SCSI) or SATA

  25. You can't just blame the disk vendors by Anonymous Coward · · Score: 0

    Those web sites reviewing disk hardware never include any details about reliability. In a few cases you see they comment about the reliability in the specification, but they never ever actually test it. All they test is performance, some go even further and test both sequential and random access/write. That's the best kind of reviews you get. Noone is testing powerloss.

    If a disk vender creates a reliable consumer driver it will not sell because it will get bad reviews.

    I have Intel 330 SSD and had to manually enable NCQ. The worst part is that I've not noticed and performance changes at all.

  26. The real problems are entirely different by amorsen · · Score: 2

    The article is total crap, every disk supports NCQ as half the world's population has pointed out in the comments.

    The problems are elsewhere: When a disk suddenly loses power while it is writing, there is a risk of various interesting errors. The disk may a) write nulls instead of the correct data, b) write garbage instead of the correct data, c) fail in the middle of a Read-Modify-Write operation and therefore destroy data in files which weren't written to at all, d) write good data to the wrong place on the disk, e) write garbage to a random spot on the disk. Sometimes you are lucky and the errors result in bad hardware checksums so you know you have lost data, at other times the wrong data gets the correct checksum.

    In practice, very few desktop/notebook/whatever users will see these problems. No reviews test for these types of errors, so you cannot try to buy drives which fail in less harmful ways. If you care enough, you will use file systems with checksumming designed to catch all the above errors and more (Btrfs and ZFS come to mind). They will at least notify you that it happened, and depending on the redundancy settings they may be able to rescue the destroyed data.

    --
    Finally! A year of moderation! Ready for 2019?
  27. Can he turn water to wine too? by arth1 · · Score: 2

    Given that we are talking about Kirk McKusick an appeal to authority is entirely fair. Just because he didn't have a bunch of citations or references listed at the bottom of the article does not mean they do not exist somewhere. For you to say it is a "fallacious" appeal to authority is unfair - it has not been proven as fallacious

    It's usually up to the one who makes a claim to back it up with evidence, not for others to disprove it - and they can't either, because there's falsifiability here. If I show that my drive has NCQ that works, that still doesn't falsify his claim. I can't bloody well test every drive on the planet, so there's no way to disprove him.
    So yes, this is appeal to authority and what you do is putting the onus on those who disagree to prove a negative.

    He may be right, and he's certainly renown, but to jump from there to "therefore he is right" is bunk. Even Einstein and Feynman make wrong claims. No one is immune. So some evidence would be welcome.

  28. Or you could just use a decent controller by dbIII · · Score: 1

    Probably every half decent controller card on the market for the last decade gets around this problem with a bit of memory and a battery to keep it alive. If you have a lot of disks on one system you'd probably have a controller like that anyway just to get enough SATA/SAS connections.
    I can see how it's a big deal with workstations/desktops/laptops but that's really only a small chunk of storage in general.

  29. Get Hardware RAID outside of edge cases like above by dbIII · · Score: 1

    Only? What about the advantage of a lot more SATA/SAS connections than you get on your motherboard? Also ZFS is limited in the number of platforms it is available for and BTRFS is not ready so it's a bit of a red herring throwing those in and saying that hardware RAID is not required because those exist.

  30. Some Quick Facts by evilviper · · Score: 1

    This is a lot of noise for nothing. For kids and amateurs, here's a quick summary...

    fsync used-to be the go-to, but that was decades ago, when IDE was in full-swing. Back then, there was a big hub-hub about drives lying. Since then, it's been common knowledge and status-quo that fsync is not trustworthy, end of story.

    Today, we have WRITE BARRIERS, and they work great. Ever since, say, the advent of 60GB IDE drives, I've never found a drive that doesn't support write barriers, and in my conversations with Theodore Tso (maintainer of EXT3/EXT4), he said as much as well. I was surprised when I started up DRBD on a test system and found the system complaining the old 40GB drive I was using for testing didn't support write barriers, so that's how long ago we're talking about this having LAST been an issue.

    There's still some issue with non-journaled file systems. If you're a BSDer, you really need to disable disk cache to prevent risks of corruption with soft updates. The XFS guys recommend disabling disk cache as well, but I suspect that's just because larger RAID arrays may have entire large files cached, resulting in some individual file loss after power-outages.

    Any RAID controllers will have such an option... Write-through and write-back... with advice to be sure your RAID cache's backup battery is working fine before enabling write caching.

    --
    Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
    1. Re:Some Quick Facts by systemeng · · Score: 1

      The XFS guys are right. I've lost data multiple times on XFS due to their disk caches being enabled in Suse 11.4. My disks are ST9750420AS using bios raid. I finally had to disable disk write caches on bootup. These losses were not even due to power failure: these losses were incurred during graceful system shutdown.

    2. Re:Some Quick Facts by fnj · · Score: 1

      XFS is nothing but a complicated way to corrupt your data. You can lose a whole RAID5 in a heartbeat, and I HAVE. Nobody should use XFS for anything. There is no excuse for it to even exist.

  31. Write Cache bad, NCQ good. SATA poor quality by Anonymous Coward · · Score: 0

    I don't know how this is news. I've been developing RAID systems for 15 years (stated with parallel SCSI-10 & 20), and none of this is new. The problem with disk write cache is you will lose your data if there's a power loss or the drive bugchecks and resets itself. This could also happen if you pull a hot-swap drive. With write cache enabled, the host has already been told that the data is saved. With cache disabled and tagging or NCQ, you still use the cache and get the advantages of optimized ordering and consolidated writes, but you don't get notification of write completion until the data actually hits the media. The latency is lower for write caching, but the maximum steady-state throughput of uncached tagged writes is about the same since the drive cache is always full and needs to be dumped to media. Some SATA drives literally used the same code for NCQ and caching with the notification time being the only difference. Full-feature RAID systems cam provide safe write caching that is protected against power loss by batteries, supercaps, and flash (or some combination of these). Enterprise class systems also have multiple controllers and mirror the cache between them in case one controller or memory system dies. A drive could provide the same functionality by including enough flash to copy the cache into and a capacitor (or a system to convert latent rotational momentum into electricity) that will keep the electronics alive long enough to do the copying. However, this will probably never happen since SATA drives are designed and marketed for mass markets, and adding even a few cents of per unit cost is hard to justify. For the same reasons, the mechanical part of the SATA drives are also cheap. If you want reliability, you need to spend the extra money to get either enterprise-class SATA, SAS, SCSI, or Fibre Channel. SATA drives also suffer from poor error detection/correction algorithms because they minimize the amount of redundant metadata to increase user data space (typically 8 bytes of CRC vs. 40 bytes of ECC). The rates of undetected errors is about the same as a falsely detected error (on the order of 1 per petabyte). If you're using SATA for nearline storage, there's a good chance of inducing errors when you save and restore the data if your sets are big enough.

  32. news flash by smash · · Score: 1

    Shitty consumer oriented hardware not suitable for enterprise class data integrity and retention.

    If you need data integrity and cache, you need a battery backed up IO controller and UPS for a start. If you're relying on the fact that turning cache off on the drive is going to ensure that your writes complete before the power goes out to the drive, you've already set sail for fail.

    --
    I run: Windows, OS X, Linux, FreeBSD. Just because you have a hammer, doesn't mean everything is a nail.
  33. Re:It's cool that a US Marshall is doing this stuf by unitron · · Score: 1

    Way to go, Kirk McKusick!

    And does this mean I have a shot at being a US Marshall?

    Almost as big a one as being a U.S. Martial.

    --

    I see even classic Slashdot is now pretty much unusable on dial up anymore.

  34. Say it ain't so! by OrangeTide · · Score: 1

    That my 1TB / 32MB cache drive for under $100 is not up to the task of being both reliable and fast?

    --
    “Common sense is not so common.” — Voltaire
  35. Re:Get Hardware RAID outside of edge cases like ab by KiloByte · · Score: 1

    BtrFS is ready for serious use, there are just additional goodies planned for it.

    ZFS exists in some form for all relevant server platforms -- on Linux, the kernel module is indistributable except as source[1], but installing dkms doesn't even require knowing what a compiler is. Unlike BtrFS, I wouldn't use it for production use yet (on non-Solaris non-BSD) because the kernel module is quite new, but it's there.

    Both of them can do RAID better than the traditional models as they know the filesystem's layout. Also, they can store some files as JBD and some as RAID on the same filesystem.

    [1]. Its license was designed to be incompatible.

    --
    The creatures outside looked from Alt-Right to Antifa; but already it was impossible to say which was which.
  36. Re:It's cool that a US Marshall is doing this stuf by Anonymous Coward · · Score: 0

    The fuck is a U.S. "Martial"? That's not even a viable "Marital" aid joke.

  37. Re:Get Hardware RAID outside of edge cases like ab by dbIII · · Score: 1

    It's not the licence that's the entire problem on linux, the great big missing chunks of functionality are the problem which makes ZFS a long way from general purpose use on that platform, which is a pity because I'm very impressed with ZFS and considering setting up my next file server as BSD or some type of solaris to use it (got to see what can handle the LSI stuff - probably all). BtrFS is no more ready for serious use than reiserfs ever was - the "additional goodies" are needed to cope with hiccups from hardware even if the file system is supposed to be perfect.

  38. Re:It's cool that a US Marshall is doing this stuf by unitron · · Score: 1

    The OP AC confused the name "Marshall" with the law enforcement officer "Marshal".

    There was no viable joke in their post.

    --

    I see even classic Slashdot is now pretty much unusable on dial up anymore.

  39. NCQ isn't enough by m.dillon · · Score: 1

    That is, NCQ (for SATA) does not have enough command slots available. Only ~32 or so per port. The SAS stuff works a bit better but I think still limits you to ~32 slots per target device (instead of ~32 per port on the driver side).

    So what happens if you try to use the so-called media completion feature is that write operations eat up all your tags and crowd out the far more important read operations. This makes the feature almost worthless in my view. Not to mention that there is no way to determine, with the SATA spec, that the drive actually honors the bit. Just like the old ATA stuff, drive vendors play fast and loose with the AHCI/SATA specs in order to try to force people to use the far more expensive SAS drives, even though the actual hardware is the same.

    What we do in DragonFly is split available tags into read-dedicated tags and write-dedicated tags, approximately 3:1. Writes only last as long as it takes the device to copy the data to its internal ram caches, so there's no real need to reserve more than a few tags for writing. This leaves the remaining tags available for reading.

    If you don't do this what happens is that you can stall your read I/O by saturating all available tags with write IOs... the writes get retired instantly to the drive's ram cache UNTIL that cache is full, then suddenly the newly issued write tags stall and sit there until the drive can flush some data out. If you don't control how many tags you use for pending writes, you can completely lock out read activity. A simple 'dd' to a file... hell, a simple file copy, for a large file, is big enough to exhibit this behavior.

    This leaves even fewer tags available for writing. At best, if you want to maintain read performance, you can't really use more than ~8 or so tags for writing.

    For a SSD maximum performance can be achieved with even fewer tags since in this case all you are doing is soaking up the command overhead by pipe-lining the IOs. Meaning that 2 write tags is sufficient, 3 to be safe.

    All of this pretty much precludes being able to use the media completion bit with AHCI/SATA and still have good performance. To really be able to make use of such a bit one needs to support ~256 tags per target... that's to the actual physical device, NOT ~256 tags to the driver or ~256 tags to the SATA/SAS controller on the host.

    In DragonFly, for HAMMER1, there were numerous ordering constraints that requires at least two DISK FLUSH commands per volume header sync. For HAMMER2 there are essentially no ordering constraints except for the volume header write itself so only one DISK FLUSH is required to create a recovery point. In both cases all the writes leading up to the required demarcation point could complete in any order. When combined with read:write tag reservation performance remains good even though the DISK FLUSH doesn't operate NCQ.

    For SATA only read and write commands can use NCQ. All other commands require serialization and cannot run concurrently with NCQ commands, including unfortunately the DISK FLUSH command. You can blame Intel for this bit of stupidity... they intentionally broke the AHCI/SATA spec in order to artificially differentiate between SATA and SAS, so drive manufacturers could pump up the prices for SAS drives (even though both the hardware and the physical attachment is exactly the same). Intel broke the AHCI chipset spec in other ways to differentiate it, particularly when it comes to error recovery. They'll tell people that it was to 'maintain compatibility with the ATA command set' but IMHO they are lying. Some of the things Intel did in the AHCI spec were just phenomenally stupid. It is still much, much better than the ATA stuff, but they had a chance to make something really robust and blew it.

    For example, with AHCI/SATA error recovery requires serialization, which means that if you use a port multiplier and one drive is having problems you have to stall out I/O to ALL OTHER DRIVES while you deal with the one that is having problems.

  40. Lots of mis-information abounds here by m.dillon · · Score: 1

    Lets clear up some things:

    * First, on NCQ. *ALL* modern SATA hard drives implement NCQ and have ~31 tags.

    * Bridge chips. *NO* modern motherboard uses a bridge chip any more. Bridge chips used on devices is another matter. Some devices still use bridge chips. Many DVDs and CDs used bridge chips (which is why early SATA DVDs and CDs were so broken), though I think that is finally dying out. The most famous was one of the OCZ SSD models which used a bridge chip to tie two controllers together. The controllers could handle ~31 tags, the bridge chip could not so the host probe would indicate no NCQ support. Also, multi-physical-interface devices such as netbooks (hard drives in a 'book' with SATA, USB, and Firewire interfaces)... those generally use bridge chips that often don't support NCQ.

    * BIOS 'RAID'. So-called soft-raid. This is fake-raid. It isn't real. Don't expect it to actually work properly in a failure case. It's still talking to the AHCI controller, it's just hiding the fact from the OS. BIOS soft-raid 'controllers' are usually pretty horrible, avoidance is best.

    * On data loss from caches. It isn't the caches that you need to battery-back (unless you are REALLY dependent on fsync() times in e.g. a database application). I think the trend is more towards off-host cache redundancy these days because it gets you to approximately the same place without the need for expensive gear. A large percentage of modern filesystems use write barriers and have no problem handling drive cache loss.

    That isn't the problem. It's the physical power to the device being dropped while the write IO is in progress that is the problem. Devices, particularly SSDs but also many HDDs, cannot retire meta-data (for a SSD) or even the current sector (for a HDD) if a sudden loss of power occurs. In addition, a sudden power loss on a HDD can cause UNRELATED sectors to fail depending on how the HDD is writing (whether it is doing a full-track write or not). This can lead to serious corruption of the drive, even outright destruction. I've had quantum drives go through sudden power loss during a write with HUNDREDS of sectors lost instead of just one or two. That was a while ago, but it was still in the SATA-era... they were modern drives.

    So for UPS/power concerns the only thing that really really matters is that the drive remain powered for at least a second or two. Even that is no guarantee. Barring that you want redundant storage on separate UPS's so someone kicking a plug out or crow-baring the UPS's output doesn't take you out.

    * Super-caps... e.g. as Intel advertises on newer SSDs. These are primarily to retire meta-data so the SSD doesn't brick when you power it back up. Intel SSDs have very tiny ram caches so it might be able to retire those too, but most other SSDs have larger ram caches and no super-cap has enough suds to retire the entire cache. The idea is to not end up with a partially corrupt sector here, not necessarily to be able to retire the entire ram cache. Also, SSDs often do background cleanup when idle so not having any pending writes to an SSD doesn't make it safe, necessarily. This is what the super-cap idea primarily addresses.

    Battery-backed ram comes under the same category. Well, in this case perhaps super-cap-backed ram (good for maybe ~a week to ~a month with low power static ram). Lots of options here that don't cost an arm and a leg, but again what matters the most is that the drive be able to retire whatever it is currently writing and not necessarily whatever is currently in its caches.

    * SAS vs SATA. There is lots of talk about this all the time. I've never noticed any real difference in reliability, probably because the only real difference between the two is firmware. Drive vendors will talk-up using more robust parts but I believe that about as much as I believe that the moon is made out of cheese. There are so few components in HDDs that it is fairly difficult to differentiate consumer from enterprise these days.

  41. "Luke: Let me 'complete your training'"... apk by Anonymous Coward · · Score: 0

    http://en.wikipedia.org/wiki/Elevator_algorithm

    You must account for the hardware-side: It's a constraint over theoreticals @ the logical filesystem level ONLY, vs. the APPLIED thought above (which works @ the actual physical level rather well reducing head movements - I note others below in caching & others) to compensate for the physical machine level world of actual physical movement vs. signals only travelling @ say, 67% of the speed of light via Coax minus attenuation degradations? Well, the last of it currently is in HDD's that include mechanicals for I/O to CPU other than tracking signals such as for fanspeeds!

    (However, I appreciate yours!)

    However - Current journalling filesystems are adequate such as NTFS & it's Binary Seek methods @ the applied level, & within the bounds of current circular track driven filesystems @ the logical (and physical with HDD's, still predominant))...

    Additionally - For both speed + tolerance related purposes!

    (You sound as if you may have read up some on the only bolded portion of this below from the sounds of it)

    Anyhow/anyways:

    * It is applied techniques of that nature in any art &/or sciences (hopefully both in combination) that makes me realize the human race still has hope because we are capable of building some cool things that involve that level of thought, & others considerably more complex...

    APK

    P.S.=> That's the best part of disks lately in terms of intelligent design, imo, instead of just more or wider lanes of transfer with added signal bits... That, my friends, is APPLIED THOUGHT above... lol, it truly "elevates" the human condition!

    That's a lot different, you must admit, than just throwing "more" @ a problem instead of conquering it thru more intelligent + efficient methods & designs...

    Put it this way:

    Lotus got 1,400 hp out of a 4 cylinder (the type of car motor we all should have)

    So, imo @ least?

    All "things disk" have YET to peak!

    One day, We'll all have:

    ---

    1.) Non-Flash main "True SSD" disks
    2.) With a Flash backup in realtime via mirroring - to maintain state (that type of tech ought to)
    3.) Using filesystems DESIGNED FOR SSD, not circular disks (excellent read on that is searching IRON FILESYSTEMS online, albeit applied to ramdisks/ramdrives be they software OR hardware)
    4.) Memory path circuits all based on whatever the current state-of-the-art DDRam or whatever is mainstream + on the most maxed-out bus @ the time!

    ---

    (Now, that's what disks ought to be... see above!)

    "Hyper-Performance!"

    Hope I'm around for it when (if) that design happens!

    I emulate it now with 10k rpm WD Velociraptors 16mb cache buffered, 128mb EEC Promise Ex-8350 SATA II RAID Caching Controller, & driver software OS kernelmode system caching + a 4gb TRUE SSD (DDR2 Ram) Gigabyte IRAM offloading TONS of things most folks burden their disks with slowing them down (thank goodness for the elevator algorithm designer) to it (pagefile, logging, temp/tmp ops from all things, & more) on an NTFS compressed partition (like doubling RAM, except the pagefile.sys doesn't get it) offloading my WD Velociraptors cached-to-the-max)!

    (Patent pending APK - let /. be my documenting the design architecture of the disk of the future for a long time in the distance, because it is doable, now - Total "haul A$$" drives on all levels, no shortcomings that outweigh the benefts (except costs mostly currently)."The Future IS Now", absolutely (only not yet implemented exactly as above))...

    ... apk