What's the Damage? Measuring fsck Under XFS and Ext4 On Big Storage

← Back to Stories (view on slashdot.org)

What's the Damage? Measuring fsck Under XFS and Ext4 On Big Storage

Posted by timothy on Friday February 3, 2012 @05:40AM from the disks-groaning-with-shame dept.

An anonymous reader writes "Enterprise Storage Forum's long-awaited Linux file system Fsck testing is finally complete. Find out just how bad the Linux file system scaling problem really is."

196 comments

Min score:

Reason:

Sort:

it's pretty bad... by Anonymous Coward · 2012-02-03 05:47 · Score: 0

NOT!
BOOYA!
fsck speed, want safety by Anonymous Coward · 2012-02-03 05:47 · Score: 3, Insightful

How fast a full fsck scan is is my last concern. What about how successful they are at recovering the filesystem?
1. Re:fsck speed, want safety by h4rr4r · 2012-02-03 05:52 · Score: 5, Insightful
  
  If you need to fsck you should already be restoring from backups onto another machine.
2. Re:fsck speed, want safety by ganjadude · 2012-02-03 05:56 · Score: 1, Offtopic
  
  when I need to fsck, I just call my girlfriend
  
  --
  have you seen my sig? there are many others like it but none that are the same
3. Re:fsck speed, want safety by rickb928 · 2012-02-03 05:59 · Score: 4, Insightful
  
  More helpful advice from the Linux community. Thank you ever so much, once again right on point, timely, and effective.
  
  --
  deleting the extra space after periods so i can stay relevant, yeah.
4. Re:fsck speed, want safety by h4rr4r · 2012-02-03 06:03 · Score: 1
  
  No, just the truth from a real live sysadmin.
  If the question had been how effective a chkdsk was I would have said the same thing.
  Grow up.
5. Re:fsck speed, want safety by pankkake · 2012-02-03 06:07 · Score: 2
  
  Most popular Linux filesystems are atomic and should not need fsck, unless something really bad happens.
  
  --
  Kill all hipsters.
6. Re:fsck speed, want safety by Anonymous Coward · 2012-02-03 06:09 · Score: 0
  
  then what is the purpose of fsck and why is anyone maintaining software that are clearly not to be used ?
7. Re:fsck speed, want safety by Anonymous Coward · 2012-02-03 06:11 · Score: 1
  
  If you need to fsck you should already be restoring from backups onto another machine.
  Are you retarded? Do you actually reinstall every time your box loses power? Or when a default check is initiated after X number of mounts or days since last check?
8. Re:fsck speed, want safety by h4rr4r · 2012-02-03 06:15 · Score: 3, Interesting
  
  Because sometimes it does work. Relying on any such software is stupid.
  While the FSCK/CHKDSK runs you restore onto another machine. This way if the check finishes first, you can use it until you can switch over to the restored machine. It also can save your ass if you are not smart enough/fortunate enough to have good backups.
9. Re:fsck speed, want safety by Anonymous Coward · 2012-02-03 06:18 · Score: 0
  
  I'm just guessing here but I think his point was that home users (who don't need to be very particular about data integrity anyway) don't have 70TB of storage (yet at least), those who do also got sysadmins to maintain the systems. Besides, I guess it's faster to recover from back up than actually running fsck on 70TB, 1TB is bad enough.
10. Re:fsck speed, want safety by h4rr4r · 2012-02-03 06:19 · Score: 1
  
  I am not retarded, that is why I have an UPS and I even know how to use tune2fs.
11. Re:fsck speed, want safety by pankkake · 2012-02-03 06:22 · Score: 2
  
  Now that's just plain dishonesty.
  It's not because it's not useful most of the time that it is useless or not to be used.
  Some filesystems are not atomic or can be mounted with non-atomic options.
  Data corruption occurs.
  It's simply useful to test if the filesystem is all right. At least for developers.
  Doesn't change the fact that you can't rely on fsck to *recover* data.
  
  --
  Kill all hipsters.
12. Re:fsck speed, want safety by grumbel · 2012-02-03 06:24 · Score: 2
  
  Yep, my last experience with fsck was after a HDD has gotten a few bad sectors. fsck on the ext3 file system let me recover the data alright, except of course for the filenames, thus I ended up with a whole lot of unsorted and unnamed stuff in /lost+found, which wasn't very helpful. I'd really like to see more focus on how secure the filesystems are and less on how fast they are.
13. Re:fsck speed, want safety by kangsterizer · 2012-02-03 06:25 · Score: 2
  
  But then again, you'll want to fsck from time to time to know if you have an issue.
  If you're waiting for the issue to appear "hey boss we apparently lost half the db" you'll lose more data during the time the corruption happens and you're not aware of it, than if you detected it earlier.
  Thus being able to fsck in a decent amount of time matters.
  Thats not the only thing of course. Sometimes you don't have a backup. Sometimes things are fucked up. Sometimes you're just required to get the thing running before the backup restoration is complete. Etc.
  Otherwise, you know, we could just delete fsck, since as you pointed out, it's *never* needed!
  Yea, right. :)
14. Re:fsck speed, want safety by Anonymous Coward · 2012-02-03 06:27 · Score: 0
  
  Yeah that was it, more or less. I back-up my personal files, which I suspect is more than many home users do even with Linux, but that doesn't change the fact that fsck's primary purpose is to recover a potentially damaged file system. If it was to be fast, why bother at all?
15. Re:fsck speed, want safety by EETech1 · 2012-02-03 06:28 · Score: 1
  
  AKA the what's all the fuss about all those nines guy!
  Cheers!
16. Re:fsck speed, want safety by Anonymous Coward · 2012-02-03 06:28 · Score: 0
  
  You are retarded if you think losing power has any incidence on a modern Linux filesystem.
17. Re:fsck speed, want safety by h4rr4r · 2012-02-03 06:30 · Score: 2
  
  No, its primary job is to tell you about integrity of the filesystem. Any attempt at fixing it is secondary.
18. Re:fsck speed, want safety by h4rr4r · 2012-02-03 06:33 · Score: 1
  
  I never said such a thing. If you are fscking you are doubting your filesystem and there fore should already be restoring your backups. If you get lucky and everything is ok all you lost was a little time, if not you are ready to roll out the machine the backups went too.
19. Re:fsck speed, want safety by darkpixel2k · 2012-02-03 06:36 · Score: 5, Funny
  
  when I need to fsck, I just call my girlfriend
  Why? Do you not know how to use the command line?
  
  --
  There's no place like ::1 (I've completed my transition to IPv6)
20. Re:fsck speed, want safety by Anonymous Coward · 2012-02-03 06:40 · Score: 0
  
  Fair point. But speed is still lower down the list...
21. Re:fsck speed, want safety by Anonymous Coward · 2012-02-03 06:41 · Score: 0, Funny
  
  when I need to fsck, I just call my girlfriend
  Me too.
22. Re:fsck speed, want safety by h4rr4r · 2012-02-03 06:41 · Score: 1
  
  Basically that.
  They don't want you to know how little they know. If they used the same name over and over that pattern would be visible.
  Or maybe, yeah probably, ALIENS!
23. Re:fsck speed, want safety by Anonymous Coward · 2012-02-03 06:47 · Score: 0
  
  Thus being able to fsck in a decent amount of time matters.
  It sure does. If it takes more than 4 hours, call your doctor.
24. Re:fsck speed, want safety by hackstraw · 2012-02-03 06:50 · Score: 5, Interesting
  
  The largest filesystem I admin is just shy of 1/2 petabyte. And its one in number. Backing up everything on that filesystem is simply not feasible. To put it in perspective 1 stream @ 200 MiB/s would take almost 28 days to backup the whole thing. I would imagine a restore would take about the same order. Telling hundreds of users their files are unavailable for reading or writing for 30 days is not really an option, so I run fsck.
  Backups simply are not really an option past 20+ terabytes of storage, and simply not feasible if the storage is volatile in nature. AFAIK everyone has gone to redundancy over backups at scale.
25. Re:fsck speed, want safety by gweihir · 2012-02-03 06:52 · Score: 0
  
  So basically a disconnect between ego and competence and they know it. Can you get any lower? Probably not.
  ALIENS would be the better option, but somehow human stupidity and cowardliness seems a lot more likely....
  
  --
  Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
26. Re:fsck speed, want safety by cloudmaster · 2012-02-03 06:53 · Score: 1
  
  I'm also still wondering why they tested how long it takes to check a filesystem which has no problems. Why didn't they just test how long it takes to replay the journal if that's all they wanted? They wouldn't have had to wait hours for ext's fsck to finish that way. :)
27. Re:fsck speed, want safety by h4rr4r · 2012-02-03 06:56 · Score: 2
  
  You need to be writing that data to two or more, more really, filesystems at the same time. Streaming replication.
  Redundancy can be backups, if they are in different locations and proper versioning is used.
28. Re:fsck speed, want safety by chuckymonkey · 2012-02-03 07:05 · Score: 5, Insightful
  
  You're fairly wrong there, you can actually back that much data up. You just have to be willing to pay for some seriously large tape libraries and they're not cheap. We're in the process of installing a 700TB array with a 1.5PB tape library backup. You just have to do the backups using filesystem snapshots and run them pretty much constantly.
  
  --
  "Some books contain the machinery required to create and sustain universes."-Tycho
29. Re:fsck speed, want safety by Nutria · 2012-02-03 07:08 · Score: 1
  
  When did sys admins forget how to install multiple tape dives into a computer?
  
  --
  "I don't know, therefore Aliens" Wafflebox1
30. Re:fsck speed, want safety by nine-times · 2012-02-03 07:10 · Score: 1
  
  Not really. First, there are problems that a filesystem check can repair without damaging the integrity of your data.
  More importantly, some filesystem/disk problems are transparent until you check for errors. Linux is usually set to do a fsck at regular intervals in order to detect errors that might otherwise go undetected. So, in short, you might not know that you need to restore from backups until you do a filesystem check.
31. Re:fsck speed, want safety by HiThere · 2012-02-03 07:14 · Score: 2
  
  The last time I checked, the system required that fsck be run after a power loss. Also after the first reboot aften n days had passed. (I think n is somewhere around 200, but I haven't been interested enough to pin it down precisely.) And occasionally a system upgrade will require a reboot.
  OTOH, recovery is definitely a lot faster than it used to be, thanks to journaling.
  OTTH, all of my parftitions together are barely over 1TB, so this is only significant (to me) for future systems, when this will have changed anyway.
  
  --
  
  I think we've pushed this "anyone can grow up to be president" thing too far.
32. Re:fsck speed, want safety by phorm · 2012-02-03 07:14 · Score: 3, Insightful
  
  If you're in a scenario where "Backups are not really an option", somebody is doing something wrong...
  How long did it take you to get to 0.5PB? If you use a differential backup/sync, then you should generally only need to copy *NEW* data, and the old stuff will already be there.
33. Re:fsck speed, want safety by Nutria · 2012-02-03 07:15 · Score: 2
  
  you restore onto another machine
  ROTFLMAO,
  We struggle to even get test machines; there's no way that "they" would pay for all that kit to just sit there gathering dust waiting for a disaster. If anything, it would be our DR machine and we'd instantly flip production over to it.
  
  --
  "I don't know, therefore Aliens" Wafflebox1
34. Re:fsck speed, want safety by Anonymous Coward · 2012-02-03 07:18 · Score: 0
  
  Yes, as we all know, anonymous cowards on slashdot are the official spokesmen for the Linux community. Thanks for being rational and level headed about what you read on the internet!
35. Re:fsck speed, want safety by Nutria · 2012-02-03 07:19 · Score: 1
  
  Most popular Linux filesystems are atomic and should not need fsck, unless something really bad happens.
  That's a joke, right? 'Cause we *all* know that Really Bad Things *never* ever happen...
  
  --
  "I don't know, therefore Aliens" Wafflebox1
36. Re:fsck speed, want safety by Grishnakh · 2012-02-03 07:20 · Score: 1
  
  Besides the other responses to this comment, is it not possible to break up that 500TB into smaller, more manageable volumes? Does it all really need to be a single volume? Why not have a bunch of smaller volumes, each mounted to the filesystem? Then, either backing up or fscking any one volume wouldn't take that long (you could have multiple tape drives, each backing up a different volume simultaneously), and if something goes wrong with one volume the other volumes will still be ok.
37. Re:fsck speed, want safety by Grishnakh · 2012-02-03 07:24 · Score: 1
  
  There's no way aliens are remotely as stupid as humans, at least not if they've managed to travel to earth. We're still arguing whether we should bother going back to our nearby moon, and mostly saying we'd rather just sit around and play with piles of green paper and play Angry Birds; any culture advanced enough to travel to a different star system would be far more intelligent than us.
38. Re:fsck speed, want safety by Anonymous Coward · 2012-02-03 07:25 · Score: 0
  
  Some filesystems are not atomic or can be mounted with non-atomic options.
  Data corruption occurs.
  That's correct.
39. Re:fsck speed, want safety by Anonymous Coward · 2012-02-03 07:33 · Score: 1
  
  I'm the OP/AC. There's no reason that I'm AC other than that I'm too lazy to create an account. I'm under no illusions as to how much or little I know, and don't care if you know my name, it's little less anonymous than Anonymous Coward.
  I can't claim credit for any invasions though.
40. Re:fsck speed, want safety by Anonymous Coward · 2012-02-03 07:34 · Score: 0
  
  So, how the fuck do you plan to resize2fs without fscking first?
41. Re:fsck speed, want safety by Anonymous Coward · 2012-02-03 07:38 · Score: 2, Interesting
  
  So you have 1/2 petabyte storage but 200 MiB/s speed -- are you kidding me ? Is your storage controller broken or really cheap or both ?
  Also, xfsdump (which is used to backup xfs) can do multi-threaded backups.
  Now to comment on the test -- it is completely insane. As mentioned by you and others, if you are running fsck while your whole application is down -- thing broken is not system but the thing inside the skull -- you will obviously need a very fast backup/restore and/or a HA solution, both are not (and need not be) mutually exclusive.
42. Re:fsck speed, want safety by fnj · 2012-02-03 08:01 · Score: 1
  
  You must not have checked recently. Both ext3 and ext4 just recover by replaying the journal after a power loss. No fsck is "required".
  I have approximately 36 TB on my active systems. This stuff is significant to me right now.
43. Re:fsck speed, want safety by Anonymous Coward · 2012-02-03 08:03 · Score: 0
  
  I don't know about the other ACs, but I tend to take my two days to use up my points. I'm reluctant to vote up and even more reluctant to vote down. Quite often I don't use all of my points before they expire.
  I've had mod points nearly every day for the past several months. Quite annoying actually. The discussions I'm most likely to post in are also the ones I'm most likely to vote in.
44. Re:fsck speed, want safety by Anonymous Coward · 2012-02-03 08:09 · Score: 0
  
  Do you really think anyone bothers to read AC comments?
45. Re:fsck speed, want safety by Anonymous Coward · 2012-02-03 08:12 · Score: 0
  
  when I need to fsck, I just call my girlfriend
  Why? Do you not know how to use the command line?
  I pipe his girlfriend.
46. Re:fsck speed, want safety by tlhIngan · 2012-02-03 08:23 · Score: 4, Interesting
  
  The largest filesystem I admin is just shy of 1/2 petabyte. And its one in number. Backing up everything on that filesystem is simply not feasible. To put it in perspective 1 stream @ 200 MiB/s would take almost 28 days to backup the whole thing. I would imagine a restore would take about the same order. Telling hundreds of users their files are unavailable for reading or writing for 30 days is not really an option, so I run fsck.
  Which means You're Doing It Wrong(tm).
  Two words: volume snapshot.
  What it does is give you a view of the filesystem as it exists at that the time the snapshot is taken. The frozen image is mounted in another mountpoint (read-only), while the snapshotted voume is still accessible (read-write). Changes to the volume since the snapshot was taken won't be in the snapshot (obviously).
  Your backup points to that snapshot which won't change and that's copied to tape. Once you're done backing up 30 days later, you delete the snapshot.
  Since your backup takes so long, you'd immediately then make another snapshot and being the backup again.
  If it's a database, the database backup tools work on a database snapshot - it will be correct and consistent as of when the snapshot was taken while the database remains available for reading and writing outside of the snapshot.
  Having to take a system down to back it up is a dead concept on modern OSes as they all tend to have snapshot capability.
47. Re:fsck speed, want safety by _LORAX_ · 2012-02-03 08:25 · Score: 4, Informative
  
  Backups simply are not really an option past 20+ terabytes of storage, and simply not feasible if the storage is volatile in nature. AFAIK everyone has gone to redundancy over backups at scale.
  200TB/130TB usable clustered/distributed system with 4x LTO5 drives and we do a full snapshot to tape every week. With data that size you either pay up-front for proper engineering or you pay for the life of the system for poor performance and eventual cleanup of the mess.
48. Re:fsck speed, want safety by LoRdTAW · 2012-02-03 08:33 · Score: 0
  
  I think he means:
  
  if (horny) girlfriend(penis);
49. Re:fsck speed, want safety by Anonymous Coward · 2012-02-03 08:38 · Score: 0
  
  Posting as anonymous coward because I don't want my boss to hear me recommending other products.
  I work at a company that has big irons as customers(e.g. banks, multi-million dollar firms..etc) but I'm sure you could find less sophisticated products that do the same thing we do(less reliably, slower... how much is a 1 bit error in your data worth to you?)
  What you need is de-duplication in your backup. Once your system approaches petabyte range you usually find high duplication, and even if not, you have a lot of duplication between backups(90% of the files won't change month to month), so you only store the difference. This way you have a fast backup AND restore option.
50. Re:fsck speed, want safety by lvxferre · 2012-02-03 08:42 · Score: 5, Funny
  
  Protip: if 'make love' returns no target, you need to do the job by hand.
  
  --
  Nerdy news for your nerdy needs? http://www.soylentnews.org Soylent News is people!
51. Re:fsck speed, want safety by Anonymous Coward · 2012-02-03 08:44 · Score: 0
  
  Try reading again. Most people use redundancy, not backups.
52. Re:fsck speed, want safety by Aighearach · 2012-02-03 08:49 · Score: 2
  
  Databases.
53. Re:fsck speed, want safety by Grishnakh · 2012-02-03 08:53 · Score: 1
  
  Right, but even so, unless you just have one giant-ass database, if you have multiple data sets, you should be able to organize them into separate databases and put the different databases on separate volumes.
  500TB sounds like one hell of a database if that's just a single one.
54. Re:fsck speed, want safety by h4rr4r · 2012-02-03 09:12 · Score: 3, Insightful
  
  Most people are worried more about cost then reliability.
  Most people is often a category that does not do things the best way or the right way.
55. Re:fsck speed, want safety by Anonymous Coward · 2012-02-03 09:23 · Score: 0
  
  when I need to fsck, I just call my girlfriend
  Why? Do you not know how to use the command line?
  Or he can gesture with his hands.
56. Re:fsck speed, want safety by chuckymonkey · 2012-02-03 09:23 · Score: 3, Insightful
  
  I know I'm posting to an AC here, but I want to point something out. "Backups simply are not really an option past 20+ terabytes of storage, and simply not feasible if the storage is volatile in nature." He was claiming that it's not feasible to back up more than 20+ TB of storage when in fact it is. I was pointing out that yes you can, but it's pretty expensive.
  
  --
  "Some books contain the machinery required to create and sustain universes."-Tycho
57. Re:fsck speed, want safety by Daniel+Phillips · 2012-02-03 09:28 · Score: 2
  
  You are not helpful. In the real world fsck is an important determinant of filesystem robustness. In your career, your proverbial butt will be saved at least once by a good fsck, and you will be left twisting in the breeze at least twice because of a bad or absent fsck. Why twice? Because that is how many times it takes to send the message to someone unwilling to receive it.
  
  --
  Have you got your LWN subscription yet?
58. Re:fsck speed, want safety by Daniel+Phillips · 2012-02-03 09:31 · Score: 1
  
  Most popular Linux filesystems are atomic and should not need fsck, unless something really bad happens.
  And something really bad always happens. So please be careful with that word "should".
  
  --
  Have you got your LWN subscription yet?
59. Re:fsck speed, want safety by networkBoy · 2012-02-03 09:36 · Score: 1
  
  I do backups on my home array, and I have a mirror for StuffThatMatters (tm). I still run fsck & chkdisk on my linux and windows machines. More than once I have had them raise flags about a drive that had recoverable read errors. This means it is time to add a new drive to the mirror or a new JBOD disk and re-sync or copy everything over to the new disk and unjoin ore remove the failing disk. While downtime is not an issue for me at home, it is inconvenient. Having these tools run at night when I don't need the machine allows me to not even have to deal with restoring my backups.
  Much Easier.
  -nB
  
  --
  whois gawk date unzip strip find touch finger mount join nice man top fsck grep eject more yes exit umount sleep dump
60. Re:fsck speed, want safety by Dishevel · 2012-02-03 09:36 · Score: 2
  
  Yup.
  Every week I switch over my systems (master/slave arrangement) and take the old master down and fsck.
  Making sure all is well. Sometimes there is a small issue. It fixes it. All is well.
  So far I have never had catastrophe. Where I loose all data on the Master while my slave is down hard.
  Going to a tape back up even a day old is going to be bad news.
  
  --
  Why is it so hard to only have politicians for a few years, then have them go away?
61. Re:fsck speed, want safety by Anonymous Coward · 2012-02-03 09:47 · Score: 0
  
  You use online resize, duh. It happens.
62. Re:fsck speed, want safety by ion++ · 2012-02-03 09:53 · Score: 2
  
  We're in the process of installing a 700TB array with a 1.5PB tape library backup. You just have to do the backups using filesystem snapshots and run them pretty much constantly.
  And XFS is pretty brilliant for taking filesystem snapshots. Using the command xfs_freeze you can make good snapshots of XFS in what appears to have no downtime at all see XFS manpage like http://linux.die.net/man/8/xfs_freeze
  And then run these commands:
  
  xfs_freeze -f /mount/point && block_level_snapshot && xfs_freeze -u /mount/point
  
  Last time I checked that did not work with EXT4.
63. Re:fsck speed, want safety by Aighearach · 2012-02-03 10:24 · Score: 1
  
  That's what NASA uses XFS for...
  They have 2 of them.
64. Re:fsck speed, want safety by siDDis · 2012-02-03 10:39 · Score: 1
  
  I work with IPTV and VOD, we have 4 PB of data running on FreeBSD and ZFS which is being replicated off site with the send && receive features that comes with ZFS. Since we mostly deal with large media files we have even reversed the replication direction. That means that if master storage needs to go down for maintenance, the other offsite storage becomes the master. At the moment we're looking into using HAST which will make it even easier to switch what storage site that should be the master.
65. Re:fsck speed, want safety by Anonymous Coward · 2012-02-03 11:11 · Score: 0
  
  No kidding. I remember a time when I spent 9 hours for a check disk to complete on an VMS system. And in comparison, the disks were laughable in size by today's standards. Given the FS sizes and number of files, I didn't see anything which was all that unreasonable. And if it is that important, that's exactly why you better have your system attached to a battery backed power supply.
  If this is a serious issues for someone, they are doing it wrong. Period.
66. Re:fsck speed, want safety by jedidiah · 2012-02-03 11:30 · Score: 1
  
  I was working with 50TB databases 10 years ago. They have to be up to ungodly sizes now.
  
  --
  A Pirate and a Puritan look the same on a balance sheet.
67. Re:fsck speed, want safety by gknoy · 2012-02-03 11:40 · Score: 1
  
  How do you restore it, in case of failure?
68. Re:fsck speed, want safety by PhunkySchtuff · 2012-02-03 19:47 · Score: 1
  
  Yes, because every time I have an unclean shutdown, I sure want to be recovering from tape.
  
  --
  Specialist Mac support for creative pros, Melbourne
69. Re:fsck speed, want safety by pankkake · 2012-02-03 23:02 · Score: 1
  
  Don't you understand the word "unless"?
  
  --
  Kill all hipsters.
70. Re:fsck speed, want safety by Ginger+Unicorn · 2012-02-04 01:33 · Score: 1
  
  Expense directly impacts feasibility.
  
  --
  (1.21 gigawatts) / (88 miles per hour) = 30 757 874 newtons
71. Re:fsck speed, want safety by Anonymous Coward · 2012-02-04 05:35 · Score: 0
  
  Very glad I don't rely much on the clouds then
72. Re:fsck speed, want safety by HiThere · 2012-02-04 06:21 · Score: 1
  
  My gues would be that at least one of my partitions had aged sufficiently that it decided to run fsck on reboot, and I saw some sectors being corrected by the journaling. I've got around 6-7 partitions split between two drives, sometimes more, sometimes less, depending on what experimental systems I have installed, so it's quite likely that some of them were old enough the last time the power failed.
  As I only paid attention to errors, this would explain why I still thought that "power failure means fsck runs". It's less likely to be true today, as a recent system upgrade required a reboot, and on coming up I noticed that at least one partition required fsck. (Note how much attention I paid: fsck had to run, but I don't really know on how many partitions. But there weren't any errors that needed recovery.) But next month it will be more likely.
  However, as was said, as the disk space increases, this becomes more significant. Unless I start dealing with really strange data, however, no matter how large the disk gets I don't expect to use partitions larger than will fit on a DVD (or, if I upgrade my media, a bluray disk). I use disk mirroring for backup (nightly rather than streaming) but a power surge could take them both out, so occasional backup is to DVD.
  
  --
  
  I think we've pushed this "anyone can grow up to be president" thing too far.
73. Re:fsck speed, want safety by Daniel+Phillips · 2012-02-04 13:16 · Score: 1
  
  Please excuse me, but I understand that you somehow reached a wrong conclusion, however you choose to define standard English words.
  
  --
  Have you got your LWN subscription yet?
74. Re:fsck speed, want safety by pankkake · 2012-02-05 02:27 · Score: 1
  
  should: "Used to express probability or expectation"
  So I don't really see the issue with the should/unless.
  I'm not a native English speaker. My sentence might not be the best way to express what I meant. But what I am sure of is that you are behaving like a dickhead.
  
  --
  Kill all hipsters.
75. Re:fsck speed, want safety by fnj · 2012-02-05 03:52 · Score: 1
  
  OK, technically fsck is run after power fail or CPU lockup, or after a certain number of mounts (think 30), or after a certain time interval (think 6 months) following the last fsck. However, for ext3 and ext4, fsck just uses the journal to recover and the time required is extremely short, so one never notices. What I call the "real" exhaustive fsck is only run if fsck "finds" that it is really needed. It's been so long since I have used any partition without journaling, or since a "real" fsck has been required on one of my journaled partitions, that I tend to regard "real" fsck'ing as a relic that doesn't happen any more.
  I have sixteen 2 TB partitions and a bunch of lesser ones in regular use, all of them ext3 or ext4, and have NEVER noticed any appreciable slowdown in booting.
76. Re:fsck speed, want safety by Anonymous Coward · 2012-02-05 13:12 · Score: 0
  
  One is useless without the other. In the real world people don't want to think about this stuff. It works until it doesn't and then you fix it with fsck or chkdsk or whatever. The fact that there are error corrections calculated by the hardware and the bus chips is moot. The fact the the file system has journals to try to make things more reliable is moot. Stuff breaks. Stuff isn't backed up. You need to be able to get back up and working quickly. That is how 98% of the Linux systems out are used.
See people! Here's the danger! by Anonymous Coward · 2012-02-03 05:49 · Score: 0

What's the Damage? Measuring fsck Under XFS and Ext4 On Big Storage
Because of politically correct speech, I read the headline as "What's the Damage? Measuring fuck Under XFS and Ext4 On Big Storage"
Jessie Mother fscking crisko!
fsck xfs does something? by drewstah · 2012-02-03 05:54 · Score: 3, Interesting

When I had some EBS problems a couple years ago, I figured I would run xfs_check. It seemed to do absolutely nothing, even if there were disks known to be bad in the md array. xfs is nice and fast, but I haven't seen the xfs_check or xfs_repair to do either of the things I'd assume they'd do -- check and repair. I found it easier to delete the volumes and start from scratch, because any compromised xfs filesystem seems to be totally unfixable. Is fsck for xfs new?

--
I do stuff Zhrodague
1. Re:fsck xfs does something? by larry+bagina · 2012-02-03 06:00 · Score: 2
  
  I set up an xfs volume a couple years back. After copying a few files over nfs, it became corrupted. the xfs fsck did something -- it told me that it was so corrupted, it couldn't be fixed.
  
  --
  Do you even lift?
  These aren't the 'roids you're looking for.
2. Re:fsck xfs does something? by Sipper · 2012-02-03 07:29 · Score: 1
  
  When I had some EBS problems a couple years ago, I figured I would run xfs_check. It seemed to do absolutely nothing, even if there were disks known to be bad in the md array. xfs is nice and fast, but I haven't seen the xfs_check or xfs_repair to do either of the things I'd assume they'd do -- check and repair. I found it easier to delete the volumes and start from scratch, because any compromised xfs filesystem seems to be totally unfixable. Is fsck for xfs new?
  It's not you; xfs_repair will only operate on a filesystem that is not mounted at all. In other words, if you want to run xfs_repair, you need to do it after booting a LiveCD of some kind. Even when using the -d option for "dangerous" which implies that it will operate on a filesystem mounted read-only, xfs_repair will refuse and simply quit.
  However once you do boot a LiveCD and run xfs_repair, it does actaully repair an XFS filesystem. For obvious reasons this is critical to be able to do, because any improper shutdown without unmounting first will corrupt an XFS filesystem.
  I've been running XFS for several years on my laptop, and recently even use XFS on top of encypted LUKS. This is probably a fairly rare setup but I can't find anything better because I like the speed the XFS filesystem allows. Using XFS I'm able to transfer 40 MiB/s solid via FTP over 1Gb ethernet even with LUKS encyption, but can't get any more than about 30 MiB/s (and which speed varies) when using EXT4. However I know EXT4 is safer to use.
  The only thing I (sort of) regret) was to use XFS on a remote server. If the power drops I'll have to have the ISP give me a remote KVM connection over IP and tell them to insert a Linux LiveCD to allow me to recover the XFS filesystems. That's a bit inconvenient -- however thankfully the ISP the box is hosted with (CoreNetworks.net) will actually do all of that at no cost, so I can at least deal with the problem if it ever happens.
3. Re:fsck xfs does something? by Sipper · 2012-02-03 07:37 · Score: 3, Informative
  
  I set up an xfs volume a couple years back. After copying a few files over nfs, it became corrupted. the xfs fsck did something -- it told me that it was so corrupted, it couldn't be fixed.
  I think you mean xfs_repair. On XFS, fsck is a no-op.
  I've never yet seen xfs_repair tell me there was an issue it couldn't fix -- that sounds unusual. However there have been lots of changes to XFS in the Linux kernel in recent years, and occasionally there has been a few nasty bugs, some of which I ran into. Linux-2.6.19 in particular had some nasty XFS filesystem corruption bugs.
4. Re:fsck xfs does something? by Anonymous Coward · 2012-02-03 07:55 · Score: 0
  
  I don't think xfs_check is going to be repairing your softraid array, bro
5. Re:fsck xfs does something? by Vanders · 2012-02-03 07:57 · Score: 1
  
  In other words, if you want to run xfs_repair, you need to do it after booting a LiveCD of some kind.
  No you don't. Either force unmount the filesystem, or if you deem that too dangerous, boot into single user mode.
  
  --
  Syllable : It's an Operating System
6. Re:fsck xfs does something? by Sipper · 2012-02-03 08:25 · Score: 1
  
  In other words, if you want to run xfs_repair, you need to do it after booting a LiveCD of some kind.
  No you don't. Either force unmount the filesystem, or if you deem that too dangerous, boot into single user mode.
  ? I use XFS for the root filesystem. Tell me how I can completely umount it and then run xfs_repair -- which has to be read from the same filesystem I just umounted.
  Stop trying to oversimply things you don't understand.
7. Re:fsck xfs does something? by Anonymous Coward · 2012-02-03 08:33 · Score: 0
  
  xfs_check has never actually fixed anything, it was only useful for .. you know ... checking.
  xfs_repair, interestingly enough, will repair the filesystem.
  And if you want a read-only test of the fs, xfs_repair -n scales much better than xfs_check.
8. Re:fsck xfs does something? by Anonymous Coward · 2012-02-03 08:41 · Score: 0
  
  Check it:
  [root@host ~]# mount /dev/sdb5 /mnt/test
  [root@host ~]# mount -o remount,ro /dev/sdb5
  [root@host ~]# xfs_repair -d /dev/sdb5
  Phase 1 - find and verify superblock...
  Phase 2 - using internal log
  - zero log...
  - scan filesystem freespace and inode maps... ....
  so boot to single user mode with a readonly root, grab the console, and off you go, same as, say, ext3.
9. Re:fsck xfs does something? by AndroSyn · 2012-02-03 08:53 · Score: 1
  
  You have your root filesystem mounted read only and then run xfs_repair on it. Sometimes getting your root filesystem remounted read-only can be tricky, however. Sometimes this requires passing init=/bin/sh to the kernel, so you start with no other processes running. However you go about getting your root filesystem mounted read only, after you run xfs_repair(or e2fsck for that matter really) you reboot immediately.
  
  Stop trying to oversimply things you don't understand.
  Perhaps you don't understand things as well as you think you do. See the section below regarding the -d option to xfs_repair and the context in which you'd use it.
  -d Repair dangerously. Allow xfs_repair to repair an XFS filesystem mounted read only. This is typically done on a root fileystem from single user mode, immediately followed by a reboot.
10. Re:fsck xfs does something? by Vanders · 2012-02-03 10:40 · Score: 1
  
  Stop trying to oversimply things you don't understand.
  Well gee I dunno brain, I've only been a professional sysadmin for a decade and been using XFS as my weapon of choice for a good three or four years, but you and your laptop should feel free to carry on.
  
  --
  Syllable : It's an Operating System
11. Re:fsck xfs does something? by Sipper · 2012-02-03 11:51 · Score: 2
  
  You have your root filesystem mounted read only and then run xfs_repair on it. Sometimes getting your root filesystem remounted read-only can be tricky, however. Sometimes this requires passing init=/bin/sh to the kernel, so you start with no other processes running. However you go about getting your root filesystem mounted read only, after you run xfs_repair(or e2fsck for that matter really) you reboot immediately.
  Just tested it [on the box in which I'm using XFS on top of LUKS encyryption], and I didn't like the results.
  grub2 by default on Debian makes a "recovery" boot option to boot into single user mode, but even with this as you mention it's required to modify the boot option and add init=/bin/sh in order to actually be able to mount the root filesystem read-only. However after finally succeeding in diong this, xfs_check reports about a full screen of errors concerning file and directory link counts, which all appear simply to be due to the filesystem being mounted and in use. When using a Knoppix CD (v6.4.4) and after using 'cryptsetup luksOpen ' to decrypt the root partition, xfs_check reports no errors at all. [And I did run xfs_repair anyway just to double-check in the latter case, and no errors were found.]
  Furthermore, upon trying to reboot from or exit the single-user mode, I got an error related to "trying to kill init" immediately followed by a kernel panic.
  So I'll admit that I was wrong and that it is possible to run xfs_repair on an XFS filesystem read-only, but I really don't like the results and I highly don't recommend it.
  
  Stop trying to oversimply things you don't understand.
  Perhaps you don't understand things as well as you think you do. See the section below regarding the -d option to xfs_repair and the context in which you'd use it.
  -d Repair dangerously. Allow xfs_repair to repair an XFS filesystem mounted read only. This is typically done on a root fileystem from single user mode, immediately followed by a reboot.
  
  I had tried it before and IIRC I had lots of trouble getting the filesystem mounted read-only, and had confusing and poor results when I finally did get it mounted read-only. All I remembered clearly in my mind was "it really didn't work", and having gone through it again I still think it doesn't. You can judge for yourself what you think I know or not. ;-)
12. Re:fsck xfs does something? by Sipper · 2012-02-03 11:53 · Score: 1
  
  Check it:
  [root@host ~]# mount /dev/sdb5 /mnt/test
  [root@host ~]# mount -o remount,ro /dev/sdb5
  [root@host ~]# xfs_repair -d /dev/sdb5
  Phase 1 - find and verify superblock...
  Phase 2 - using internal log
  - zero log...
  - scan filesystem freespace and inode maps... ....
  so boot to single user mode with a readonly root, grab the console, and off you go, same as, say, ext3.
  Try it with LUKS encryption on top. See a further reply from me to another commenter where I detail what happens when I try it. The results are not like the above.
13. Re:fsck xfs does something? by Sipper · 2012-02-03 11:58 · Score: 0
  
  Stop trying to oversimply things you don't understand.
  Well gee I dunno brain, I've only been a professional sysadmin for a decade and been using XFS as my weapon of choice for a good three or four years, but you and your laptop should feel free to carry on.
  Well thank goodness you don't have a big ego on top of everything else... sheesh.
  I'll admit that I was wrong in that it's possible to 'xfs_check' and 'xfs_repair -d ' a filesystem mounted read-only, but I tried it and see erroneous errors from xfs_check that aren't there when the filesystem isn't mounted at all, so I think it's the wrong way to go. [I tested it, see a reply to another commenter who was more respectful.]
  If you want to actually help, don't start out commenting with "no you don't".
14. Re:fsck xfs does something? by Vanders · 2012-02-03 12:18 · Score: 1, Funny
  
  Well thank goodness you don't have a big ego on top of everything else... sheesh.
  Hey, I wasn't the one being wrong and calling people dumb.
  
  If you want to actually help
  Where did you get the impression I wanted to help? You were just wrong. Get over it.
  
  --
  Syllable : It's an Operating System
15. Re:fsck xfs does something? by Sipper · 2012-02-03 12:50 · Score: 0
  
  Well thank goodness you don't have a big ego on top of everything else... sheesh.
  Hey, I wasn't the one being wrong and calling people dumb.
  
  If you want to actually help
  Where did you get the impression I wanted to help? You were just wrong. Get over it.
  I am over it. And you're not helpful at all.
Why bother? by Waffle+Iron · 2012-02-03 05:54 · Score: 1

They're testing 70 TB of storage, so with current hard drive quality, the odds of an unrecoverable read error are probably close to 100%. It would be simpler to write a two-line fsck utility to report it:

#!/bin/sh exit 1
1. Re:Why bother? by pankkake · 2012-02-03 06:10 · Score: 1
  
  Except they were using RAID60.
  
  --
  Kill all hipsters.
2. Re:Why bother? by _LORAX_ · 2012-02-03 06:16 · Score: 2
  
  After evaluating our options in the 50-200TB range with room for further growth we ended up moving away from linux and to an object based storage platform with a pooled, snapshotted, and checksummed design. One of the major reasons for this was the URE problem, we would virtually be guaranteeing silent data corruption at that size with a filesystem that did not have internal checksums. The closest thing in the OS world would be ZFS whose openness is in serious doubt. It is scary how much trust the community places on spinning rust.
  The tests are also useless since the "speed" will be linerally controlled by the IOPS of the array. Sure would be nice to be able to throw 10x15k spindles at 3.5TB ( 230 disks for the 72TB test ) that's one way to improve random IO performance, but how many can afford such luxury on a big data store that could reach into the 100's of TB?
3. Re:Why bother? by Anonymous Coward · 2012-02-03 06:19 · Score: 0
  
  Hi can you explain what "URE problem" means ? thanks!
4. Re:Why bother? by Anonymous Coward · 2012-02-03 06:28 · Score: 0
  
  I'd say ZFS is the bastion of hope in the current environment. Would you recommend going with some even more proprietary system? ZFS runs well on Solaris, OpenIndiana and BSD, and addresses every problem you mentioned and dozens more. If you have an issue with ZFS, Oracle Linux (also free, also open source) now 'ships' with BTRFS as the standard file system, giving you yet another great option.
5. Re:Why bother? by Anonymous Coward · 2012-02-03 06:35 · Score: 0
  
  xfs is a good file system however some of us wonder if making a 300TB file system at work was a great idea :)
6. Re:Why bother? by _LORAX_ · 2012-02-03 06:38 · Score: 1
  
  Unless every read does a checksum ( they don't or it would kill performance ) then there is still the possibility of a silent read corruption. At 70TB it would be rare, but not as rare as many would think and would depend on the sector size and checksum on the individual drives.
7. Re:Why bother? by gweihir · 2012-02-03 06:39 · Score: 1
  
  They're testing 70 TB of storage, so with current hard drive quality, the odds of an unrecoverable read error are probably close to 100%. It would be simpler to write a two-line fsck utility to report it:
  
  #!/bin/sh exit 1
  That is only when you use the minimal guarantees from the datasheets. In practice, with healthy disks, read errors are a lot less common.
  
  --
  Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
8. Re:Why bother? by _LORAX_ · 2012-02-03 06:47 · Score: 3, Interesting
  
  Our BTRFS evaluation resulted in rejecting it for some very serious problems ( what they claim are snapshots are actually clones, panic in low memory situations, no fsck, horrible support tools, developers who are hostile to criticism, pre-release software, ... ). ZFS was nice, but limited to non-distributed systems and still had a non-trivial amount of volume and backend management headaches. Personally I use ZFS for my personal servers at home ( incremental snapshots are the bomb ) but out production systems needed more.
9. Re:Why bother? by isorox · 2012-02-03 06:54 · Score: 1
  
  Unless every read does a checksum ( they don't or it would kill performance ) then there is still the possibility of a silent read corruption. At 70TB it would be rare, but not as rare as many would think and would depend on the sector size and checksum on the individual drives.
  Ideally you'd have something like zfs's scrubbing in the background. Or keep it in the application level (the application stores metadata about the files, may as wel throw in a checksum on create, then have a background checker), however a 1 bit error in an mpeg file isn't important.
  And when you're creating and destroying data at multi-gigabit speed, how do you perform backups?
10. Re:Why bother? by colinrichardday · 2012-02-03 07:17 · Score: 1
  
  Unless every read does a checksum ( they don't or it would kill performance )
  How does that relate to using the journal checksum option on ext4?
11. Re:Why bother? by _LORAX_ · 2012-02-03 07:26 · Score: 1
  
  jornal checksumming only prevents errors in the journal, not once the data has been written to the main storage area. This was done primarily to ensure the atomic nature of the journal is not violated by a partial write.
12. Re:Why bother? by Nick+Ives · 2012-02-03 07:27 · Score: 1
  
  Unrecoverable read error. It was mentioned in the OP.
  If you have a 200TB hard disk array then it's certain that you will encounter data corruption.
  
  --
  Nick
13. Re:Why bother? by _LORAX_ · 2012-02-03 07:28 · Score: 1
  
  That is only when you use the minimal guarantees from the datasheets. In practice, with healthy disks, read errors are a lot less common.
  Are you willing to bet 70TB+ on it, because that's what you are doing.
14. Re:Why bother? by _LORAX_ · 2012-02-03 07:31 · Score: 1
  
  Well since anything over 100TB is not supported by the vendor I would say not really a great idea. The reason it's not supported is there is no reasonable way to maintain ( things like an error would result in days worth of outages to fsck and/or restore from backup ).
15. Re:Why bother? by Guspaz · 2012-02-03 07:41 · Score: 3, Interesting
  
  ZFS now runs pretty well on Linux too, as a kernel module, thanks to zfsonlinux. If you're running a Debian-based distro, installing it is trivial (one command to add the PPA, one command to install the package).
16. Re:Why bother? by Anonymous Coward · 2012-02-03 08:03 · Score: 0
  
  The community does not place that much trust in spinning rust, hence RAID.
  Drives to checksumming internally, of course, so a silent data corruption on read is far less likely than an unrecoverable error. What were your numbers on silent bad data, by the way?
  Of course, new filesystem designs (ZFS, BTRFS) also accommodate data checksumming. Luckily for Linux users, the light is at the end of the tunnel for btrfs production readiness.
17. Re:Why bother? by ratsg · 2012-02-03 08:38 · Score: 2
  
  and ZFS is available to Mac OS X systems as an add on. Both opensource, and as of this week, a commercial version is available.
  There is very little reason to be running a system with out ZFS, unless you are running AIX, HP-UX or IRIX.
18. Re:Why bother? by Anonymous Coward · 2012-02-03 09:45 · Score: 0
  
  "The closest thing in the OS world would be ZFS whose openness is in serious doubt."
  FreeBSD has ZFS and you can look at the source code. What's not "open" about it? The only real issue is they are using strict ZFS as to keep it compatible with Solaris being able to mount it.
19. Re:Why bother? by Electricity+Likes+Me · 2012-02-03 12:31 · Score: 1
  
  ZFSonLinux's speed isn't really up there yet on Linux.
  It might only be a home system, but that to my mind makes it more important: I'd like to spend as little time maintaining as possible and that means I want to saturate my gigabit NIC when I need to.
20. Re:Why bother? by Warphammer · 2012-02-03 17:37 · Score: 1
  
  With good but not great hardware, I more than saturate gigabit on my 8-drive setup using ZFS on Linux. ZFS-FUSE would putter about at 20-40 MB/sec but this version zips right along. Saturates the gig link, scrubs at 450MB/sec... Good enough to keep me happy anyway.
21. Re:Why bother? by drsmithy · 2012-02-03 20:04 · Score: 1
  
  After evaluating our options in the 50-200TB range with room for further growth we ended up moving away from linux and to an object based storage platform with a pooled, snapshotted, and checksummed design. One of the major reasons for this was the URE problem, we would virtually be guaranteeing silent data corruption at that size with a filesystem that did not have internal checksums. The closest thing in the OS world would be ZFS whose openness is in serious doubt. It is scary how much trust the community places on spinning rust.
  What open object-based storage did you use ?
22. Re:Why bother? by Electricity+Likes+Me · 2012-02-03 23:26 · Score: 1
  
  Huh, there's been some improvements I guess. I should give it another look (since I'm running XFS on RAID6 at the moment).
23. Re:Why bother? by Guspaz · 2012-02-04 10:16 · Score: 1
  
  I was running OpenSolaris on the box before, and I haven't particularly noted any speed differences locally. I'm not sure I'm getting as much out of Samba, though.
24. Re:Why bother? by gweihir · 2012-02-05 01:49 · Score: 1
  
  If you do not maintain your disks (no surface scans, no consistency checks for RAID, no individual-attribute SMART monitoring), you may well end up at the values in the data-sheets, but operation disks that way is just unprofessional and stupid. With professional maintenance and RAID, errors are caught early when redundancy still allows correction. That said, RAID is not backup and backup is non-optional in any sane operation.
  So, no, I am not gambling here at all. I am just pointing out that the numbers in the data-sheets are fictitious.
  
  --
  Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
25. Re:Why bother? by sander · 2012-02-10 00:05 · Score: 1
  
  Any time, as any serious application will need checksums anyways. It does not help you one bit if you spread data over seven 10 GB file systems or a single 70 GB file system - you will get exactly the same amount of URE in your 70GB data set.
Breaking News! by Anonymous Coward · 2012-02-03 05:54 · Score: 2, Funny

This just in:
Full filesystem scans take longer as the size of the filesystem increases.
News at 11.
1. Re:Breaking News! by Anonymous Coward · 2012-02-03 06:47 · Score: 0
  
  You forgot "this somehow proves that Linux sucks".
Fsck times by Anonymous Coward · 2012-02-03 05:58 · Score: 0

For the FSCK times of EXT4 on 50% loaded 72TB (32TB, 105million files) drive the time was only an hour. I wish my drives at home would FSCK that fast, and I only have 2 TB formatted XFS
1. Re:Fsck times by Gazzonyx · 2012-02-03 07:12 · Score: 2
  
  They were using 15K RPM SAS drives. Your 7200 RPM drives aren't going to touch the speed of 15K RPM drives on a SAS backplane. Not by a long shot.
  
  --
  If I mod you up, it doesn't necessarily mean I agree with what you've said, sorry.
2. Re:Fsck times by Anonymous Coward · 2012-02-03 12:12 · Score: 0
  
  That depends on the number of 7200RPM drives you are using. I can spend about 3-4x the price of a SAS drive, and get 2-3 drives for that price and MUCH more storage.
  Now let's say you're using 10 15k SAS drives.
  Welp, that's great, but now I'm using 30 for the same price, and have MUCH more storage space, as well as more speed.
  Now, the math doesn't EXACTLY work out, but 15,000x10= 150,000 RPM // 7200x30= 216,000 RPM
  Since you're also sending data to more drives at once, you have increased read/write speed by using the multiple drives.
Damage? by eggstasy · 2012-02-03 06:02 · Score: 3, Funny

Honey badger don't give a fsck.
1. Re:Damage? by Anonymous Coward · 2012-02-03 08:16 · Score: 0
  
  How's wickedfire treating you?
Who would engineer a storage system like that? by Anonymous Coward · 2012-02-03 06:11 · Score: 2, Insightful

A single file system that big without checking features that file systems like ZFS or clustering file stores provide seems insane to me.
1. Re:Who would engineer a storage system like that? by Anonymous Coward · 2012-02-03 08:29 · Score: 0
  
  For a smallish SAN at 22TB ZFS was chosen for just that reason. Even My home NAS at 5TB based on ext3 got silent errors.
2. Re:Who would engineer a storage system like that? by Mysticalfruit · 2012-02-03 09:31 · Score: 1
  
  That's the same reason as well. I've got a 4TB san at home and I'm using ZFS on linux (kernel modules, not fuse) to manage it. Certain parts of it are also backed up other places, but I run a zfs scrub on it once a week. One reason I chose ZFS over ext4 was that I wanted to be able to add disks and grow the filesystems as painlessly as possible. Since the disks are hanging off a mediocre onboard controller, the idea of having to fsck 4TB in the case of power outage / crash seemed craptastic. So far I've been very happy.
  
  --
  Yes Francis, the world has gone crazy.
3. Re:Who would engineer a storage system like that? by Anonymous Coward · 2012-02-03 11:32 · Score: 0
  
  Not to mention that he used software RAID0 on non-SSD's.
  The whole thing's bogus from start to finish because NOBODY in their right mind would ever use a production software/fakeraid system the way he set this up- it's not robust to begin with. Worse, doing software RAID0 with that large a stripe's going to make ANY of his data points look worse than they really are. Enterprise Storage Forum should be ashamed of even publishing this. Seriously.
4. Re:Who would engineer a storage system like that? by Anonymous Coward · 2012-02-04 14:23 · Score: 0
  
  Not to mention that he used software RAID0 on non-SSD's.
  That would be RAID 60. i.e.: a RAID 0 composed from two RAID 6 arrays, each of which were running 8 drives plus two parity drives. He could lose two drives in each sub array without breaking the RAID 0 layer.
Re:linux is fail by hobarrera · 2012-02-03 06:11 · Score: 2

I'll go tell _average joe/jane_ to go and get AIX, and dump ubuntu+unity which they like so much because it's shiny and pretty.
Re:linux is fail by countertrolling · 2012-02-03 06:14 · Score: 1

Not to mention the everyday low price

--
For justice, we must go to Don Corleone
poor test.. bad results by _LORAX_ · 2012-02-03 06:20 · Score: 1

A much better test of linux "big data"
1) write garbage to X blocks
2) run fsck if no errors found, repeat step 1
How long would it take before either of these filesystems noticed a problem and how many corrupt files do you have? With a real filesystem you should be able to identify and/or correct the data before it takes out any real data.
1. Re:poor test.. bad results by Anonymous Coward · 2012-02-03 11:43 · Score: 0
  
  Don't forget...
  0) Set the filesystem up on a NON-SOFTWARE RAID0 setup. Use REAL hardware like the big boys like Google use for "big data" with those filesystems.
  Once you do that, your proposed test is a better one than this farce TFA author ran up the flagpole.
  What he did was just simply FAIL.
Restore time by Anonymous Coward · 2012-02-03 06:22 · Score: 0

"If you need to fsck you should already be restoring from backups"
You do realize how long it would take to restore 72tb on the class of hardware they were testing?
1. Re:Restore time by h4rr4r · 2012-02-03 06:29 · Score: 1
  
  Seconds?
  When you have that much data and you need high reliability you are doing streaming replication to multiple devices and layering other backup methods as well.
  Any idea what the cost of just trusting that the FSCK fixed the problems on 72TB of data your business needs could be?
2. Re:Restore time by ratsg · 2012-02-03 08:34 · Score: 1
  
  regardless of the time, it beats loosing all of your data.
3. Re:Restore time by sander · 2012-02-09 23:55 · Score: 1
  
  "This much data" ? Hello? Are you a time traveler from the 1990s who has missed a decade of storage space expansion or simply trying to have a cheap laugh? 72TB is not "much" in this day and age. Also, fsck only deals with metadata, if you are worried about what happened to your data, the file system at hand is not adequate to your needs anyways.
Re:linux is fail by sunderland56 · 2012-02-03 06:22 · Score: 1

OK, so I have a large x86/64 server and want to follow your advice. Can you please tell me where you can get AIX, or HP-UX, to run on X86?
Re:linux is fail by evol262 · 2012-02-03 06:27 · Score: 1

I like how you completely ignored Solaris yet still presented the comment as if it was a valid counterargument.

--
"The more corrupt a society, the more numerous are its laws." -Tacticus
Article correction by Anonymous Coward · 2012-02-03 06:28 · Score: 0

The lengthy delay in obtaining the results is due to the lack of hardware for testing time waiting for fsck to finish.
So? by Anonymous Coward · 2012-02-03 06:34 · Score: 0

Okay, so ext4 takes longer to fsck than XFS does.
Let's look at how they set up the scenario. They made a bunch of RAID6's with two spares each, and *then* made a striped RAID of those to get 72TB. This tells me that they're storing data where uptime is paramount. So, you're not in an organization where you can answer the red phone in your server room and go "Well, we're checking the drive for errors. Our 72TB of business data will be back on line in about a half-hour". So, you've certainly got hot-spares for fail-over, right?... which means that it kinda doesn't matter *how* long your primary is down (within reason, of course). I say "within reason" because the biggest discrepancy I see in their results between ext4 and XFS is about a factor of x8 (about a half-hour for ext4 as opposed to XFS's 4.5 minutes)
Their message seems to be that, if you've got 72TB of data on an array with ext4 and your only way of getting it back is with fsck, you're in a bit of trouble.
Personally, I'd shorten the message by taking the "with ext4" part out.
1. Re:So? by Guspaz · 2012-02-03 07:44 · Score: 1
  
  Isn't that the point of using a filesystem that can do online scrubs, like ZFS? As far as I know, ZFS also checks metadata when scrubbing.
Re:linux is fail by Anonymous Coward · 2012-02-03 06:35 · Score: 0

JFS also works on Linux.
Re:linux is fail by gweihir · 2012-02-03 06:36 · Score: 4, Insightful

A cranky coward from the shadows is not s reliable source of information.
I have used AIX and Solaris, and I can say that a lot of stuff is easier on Linux.

--
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
Object based storage by Al+Kossow · 2012-02-03 06:37 · Score: 1

What system did you end up going with?
How do you back it up?
Re:linux is fail by Anonymous Coward · 2012-02-03 06:38 · Score: 5, Funny

sudo kill yourself
;-)
Re:linux is fail by hawguy · 2012-02-03 06:46 · Score: 2

I'll go tell _average joe/jane_ to go and get AIX, and dump ubuntu+unity which they like so much because it's shiny and pretty.
Few average Joe's have 72TB of disk space, and even for those that do, they're probably ok with 30 - 60 minutes of FSCK time. And more likely, instead of 100's of millions of files, they probably have a few million, so their fsck time will be in the 3 - 15 minute time range.
I've seen servers that take over 3 minutes for their POST check.
Re:linux is fail by Anonymous Coward · 2012-02-03 07:04 · Score: 0

sudo dd if=/dev/zero of=/dev/brain
Re:linux is fail by bolthole · 2012-02-03 07:15 · Score: 1

What "stuff"?
Give actual, useful comparisons.
Otherwise, your comment can be reduced to,
"I am most familiar with linux. Therefore, using linux is easier for me"
Re:linux is fail by ifrag · 2012-02-03 07:15 · Score: 3, Funny

I like how you completely ignored Solaris yet still presented the comment as if it was a valid counterargument.
I also like how GP completely ignored Solaris. I just like the fact it is being ignored.

--
Fear is the mind killer.
Re:linux is fail by Anonymous Coward · 2012-02-03 07:16 · Score: 0

He didn't type enough, zing! You really got him!
print article in one page is here : by godrik · 2012-02-03 07:23 · Score: 1

http://www.enterprisestorageforum.com/print/storage-hardware/linux-file-system-fsck-testing----the-results-are-in.html
going through 3 pages is so annoying...
Re:linux is fail by Anonymous Coward · 2012-02-03 07:29 · Score: 1

Not sure how AIX will help here since it is on a similar filesystem. Also, you are comparing apples and radishes -- how does AIX compare to ubuntu+unity - one being server and other being desktop -- in other words, are you insane ?
Re:linux is fail by impaledsunset · 2012-02-03 07:46 · Score: 1

kill -9 $$ # does the job pretty well
Re:linux is fail by Anonymous Coward · 2012-02-03 07:49 · Score: 1

ZFS has 0 FSCK time as it does not need it. If you never leave your FS in an unstable state, you won't need to worry about fixing it.
Re:linux is fail by fnj · 2012-02-03 07:50 · Score: 2

killall Anonymous\ Coward
Re:fsck xfs does something? - no fsck by Anonymous Coward · 2012-02-03 07:51 · Score: 0

>I set up an xfs volume a couple years back. After copying a few files over nfs, it became corrupted. the xfs fsck did >something -- it told me that it was so corrupted, it couldn't be fixed.
Well, why don't you quote something from even older -- say linux 0.1 ? If that makes you feel better
XFS as a fs on linux (on SGI it was long time back, i am referring to the port) has matured way better over the years.
Also, xfs has no fsck -- sure it is not a case of mistaken identity ?
You need to use xfs_repair if *required* after dirty playback.
Re:linux is fail by Anonymous Coward · 2012-02-03 08:07 · Score: 1

and FSCK has 0 jail time, unlike ZFS
Re:linux is fail by cryptographrix · 2012-02-03 08:08 · Score: 2

...until you have a drive die during a scrub, destroy a zfs filesystem in a deduplicating zpool, or any other number of things that makes ZFS **ANGRY**, that is. and despite all that, I still trust it more than any most linux filesystems.
Just how bad is it? by Minwee · 2012-02-03 08:12 · Score: 1

Each pool is a LUN that is 3.6TB in size before formatting or actually 3,347,054,592 bytes as reported by "cat /proc/partitions".
a file system with about 72TB using "df -h" or 76,982,232,064 bytes from "cat /proc/partitions"

Yeah, I think there's definitely a scaling problem there.
Or perhaps a reading comprehension problem, since /proc/partitions reports in blocks, not bytes, but either way it doesn't inspire any kind of confidence in the rest of their testing methodology.
Re: Don't use xfs_check by Anonymous Coward · 2012-02-03 08:14 · Score: 0

Don't use xfs_check -- it is slow, instead run xfs_repair in -n mode
Also, there is no fsck for xfs -- for people interested in details -- it runs a playback on dirty log during a mount, a xfs_repair may be required after that but that is optional.
In other words, people who compared xfs and ext4 are not aware of this in my opinion.
Quick advice / pro-tip: Don't quote EBS and performance in same line. They don't match no matter what fs you use since the underlying medium sucks. They provide storage on 'elastic' basis -- so go figure out how fast they do it when you are writing at x MB/s
Re:linux is fail by aix+tom · 2012-02-03 08:16 · Score: 4, Informative

You see my nick?
AIX sucks more than Linux.
Usual process for "weird"* AIX Problems:
1) weird problem occurs after install. You report problem to IBM.
2) IBM asks for your software version, see they are the newest ones available, and say they look into it.
3) You ask several month later if they did find anything. They ask for your software version, they ask you to upgrade and see if the problem goes away.
4) You upgrade to newest version.
5) go to 2)
*There are of course non-weird problems where you get the answer from IBM support in 2-3 days, and from Linux forums in 2-3 minutes.
Re:linux is fail by ratsg · 2012-02-03 08:29 · Score: 1

and XFS worked great with IRIX. WTF happed to it with lunux???
Re:linux is fail by lvxferre · 2012-02-03 08:40 · Score: 3, Funny

Why would you replace a zero-ed string with another? At least use /dev/random, bro.

--
Nerdy news for your nerdy needs? http://www.soylentnews.org Soylent News is people!
Re:linux is fail by systemzvirus · 2012-02-03 08:43 · Score: 1

Whatever zLinux. Also, there is a point to tightly coupling the OS to the Hardware. Not every workload needs to be on x86 toys.
Re:linux is fail by Aighearach · 2012-02-03 08:48 · Score: 1

IBM said please don't use AIX, use Linux instead. That was like... 10+ years ago.
Damage? by erice · 2012-02-03 08:55 · Score: 2

When an article about fsck has a tag line of "What's the damage", I expect to see some discussion of how fsck deals with a damaged file system.
The time required to fsck a file system that doesn't need checking is less interesting and inconsistant with the title. Although, if fsck had complained about the known clean file system that would be interesting.
ANCIENT kernel & software for this test by Anonymous Coward · 2012-02-03 09:08 · Score: 0

Wasn't this linux kernel released in, like... 2008? Surely the author could have chosen a kernel at least released in 2011? Also, the tools may be just as old. An article should be surely written to be relevant to what's being presently included in an operating system.
I mean *DEBIAN* is using 2.6.32 in their current stable, due to be released soon. Usually they're years behind. Their upcoming release uses 3.2!
And speaking of that, XFS got a really major upgrade about 3.0 which essentially builds FreeBSD-style softupdates and journalling I/O intelligence to the file system.
1. Re:ANCIENT kernel & software for this test by Anonymous Coward · 2012-02-03 12:44 · Score: 0
  
  Wasn't this linux kernel released in, like... 2008?
  CentOS kernel is based on 2.6.18, so more like 2006. But it looks like it has been patched to hell and back (EXT4 didn't exist in 2.6.18) so I don't know why they keep calling it by that version number. Old habit?
Re:linux is fail by Saxophonist · 2012-02-03 09:09 · Score: 3, Funny

No, you're thinking of ReiserFS.
Re:linux is fail by jd · 2012-02-03 09:32 · Score: 3, Interesting

Works best if you use the "Doom as Sys Admin" hack.

--
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
The story notwithstanding, by Lord+Duran · 2012-02-03 09:54 · Score: 1

I expect a /. article like this to include a summary. Like, a word about what the results actually were, without having to click through twice to get to them.
Re:linux is fail by jd · 2012-02-03 09:58 · Score: 5, Interesting

A lot of stuff is also faster on Linux, particularly on the x86. Solaris x86 is dog slow. AIX ("aches") is an appropriate name for a mainframe OS that never really got the hang of this new-fangled "interactive user" stuff. It's a good mainframe OS, that is what it is designed for, tuned for and intended for, but traditional mainframe batch transactional work isn't the sort of payload that is typically run these days. The high-end users want hard real-time (i.e.: they know to the microsecond - or nanosecond, in some cases - exactly when each process will start and stop) for data collection, data analysis and simulation. The data centers want massive multithreading for gigantic servers with minimal overhead and service guarantees per thread. The typical user wants extremely low latency interactive. None of these are pre-scripted batch jobs.
Now, if you wanted to develop a data warehouse for, say, technical writings, journalism, etc, where you're compiling a collection of things that can be typeset overnight, that may be doable as a batch job. However, anyone planning on publishing a journal that needs 72 terabytes of storage had best consider the marketplace a little more closely first. A publishing company, say Nature, might conceivably have use for AIX for batch work. I could see the number of submissions, referee responses and article selections per journal being such that a mainframe would be a perfectly valid way to do things. Even then, it might still be sufficiently small that a live transactional database would be more cost-effective.
Traditionally, batch processing has been a niche market for electrical and gas companies, etc, where the number of customers is staggering. Even then, it has largely been replaced with live transactional systems because customers want things adjusted NOW and not overnight or at the end of the week.
Mass mailers still use batch processing, but printing is the bottleneck and there is no point in having an expensive OS process everything in a fraction of a second on an expensive mainframe when it takes N actual real-world seconds before a printer becomes available to take the next block of data. You need run no faster than the slowest component because the end produce won't be delivered any faster. You would have to have a gigantic number of printers before the OS became a significant factor and most shops just don't have that kind of printing power.

--
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
Re:linux is fail by Anonymous Coward · 2012-02-03 11:22 · Score: 0

Okay.
NO CARRIER
Re:linux is fail by jedidiah · 2012-02-03 11:27 · Score: 1

So is this about big filesystems or lots of tiny files?
'cause they are not the same thing.
How many files is a lot? 300K? 10M? 100M?

--
A Pirate and a Puritan look the same on a balance sheet.
Re:linux is fail by jad4 · 2012-02-03 14:55 · Score: 1

I think you're confusing AIX with S/390. AIX is IBM's Unix system, not mainframe. It handles interactive workloads just fine. Hell, S/390 does, too. Your batch processing concepts are a few decades out of date. Just sayin'.
On the face of it, this is poorly done by Antibozo · 2012-02-03 18:09 · Score: 1

1. Why did they put a label on the RAID devices? They should have just used /dev/sd[b-x] directly, and not confused the situation with a partition table.
2. Did they align the partitions they used to the RAID block size? They don't indicate this. If they used the default DOS disk label strategy of starting /dev/sdb1 at block 63, then their filesystem blocks were misaligned with their 128 kiB RAID block size, and one in every 32 filesystem blocks will span two disks (assuming 4 kiB filesystem blocks).
3. Why did they use md and not LVM? md can sometimes introduce bandwidth limits, and LVM lets you alternate between striped and linear volumes for your testing.
4. Why don't they report the raw bandwidth of the disk, and maybe some IOPS numbers?
5. Why don't they report total operations and bandwidth consumed as measured by iostat or sar?
6. Why didn't they give geometry hints to mkfs? The ext4 mkfs invocation, for example, should have included "-E stride=$[128 / 4],stripe-width=$[(10 - 2) * (128 / 4)]".
7. What about using an external journal?
8. They report that "during the file system check the server did not swap, and no additional use of virtual memory was observed." Wouldn't it have been better to just do "swapoff -a" and report that no swap was available?
9. Why didn't they (as someone else also suggested above) test an actually damaged filesystem?
10. Is there any indication other than their credentials that these people know what they're doing?
1. Re:On the face of it, this is poorly done by Anonymous Coward · 2012-02-04 14:37 · Score: 0
  
  6. Why didn't they give geometry hints to mkfs? The ext4 mkfs invocation, for example, should have included "-E stride=$[128 / 4],stripe-width=$[(10 - 2) * (128 / 4)]".
  If I'm reading the article correctly their RAID6 arrays had 128k block sizes and then the parent RAID0 array had a 1024k block size. So shouldn't it have been "-b 4096 -E stride=$[1024 / 4],stripe-width=$[2 * (1024 / 4)]"? +1 for everything else, though.
2. Re:On the face of it, this is poorly done by Antibozo · 2012-02-05 23:14 · Score: 1
  
  Good point--i was overlooking the parent RAID.
Obsolete? by thsths · 2012-02-03 20:04 · Score: 1

I am not sure it has much impact, but why would you use a 5 year old linux kernel to perform the test? Maturity is all very nice, but if you are pushing technology, it is not always the best approach.
Why didn't they test... by unixisc · 2012-02-04 04:50 · Score: 1

...other file systems, such as ZFS (doesn't it work w/ Linux?), Veritas, UFS and so on?
Re:linux is fail by jimicus · 2012-02-05 02:19 · Score: 2

There are of course non-weird problems where you get the answer from IBM support in 2-3 days, and from Linux forums in 2-3 minutes.
I really wouldn't paint Linux support in such rosy terms. Many forums are heading in the direction of the blind leading the blind; application-specific mailing lists and IRC channels, while improving, still have a slight tendency to say "RTFM n00b!". (Or, as happened to me, "Can't be done. It's a stupid demand anyway. Fuck off" - twenty minutes later I figured out how to do it on my own, so it evidently could be done...)
Re:linux is fail by deek · 2012-02-05 14:28 · Score: 1

Thank goodness someone has actually posted something relatively negative about ZFS. The way many people rave about it, you'd think it was God's gift to filesystems.
Ironically, that has made me more interested in using it. My general instinct is to distrust anything that is painted as all good.
Re:linux is fail by sander · 2012-02-09 23:38 · Score: 1

OK, so I have a large x86/64 server and want to follow your advice. Can you please tell me where you can get AIX, or HP-UX, to run on X86?
Right. Very funny how you managed to pick out the two systems that don't run on x6 out of the three. If your question was even remotely serious there are two options for you: Solaris and FreeBSD.