What's the Damage? Measuring fsck Under XFS and Ext4 On Big Storage
An anonymous reader writes "Enterprise Storage Forum's long-awaited Linux file system Fsck testing is finally complete. Find out just how bad the Linux file system scaling problem really is."
NOT!
BOOYA!
How fast a full fsck scan is is my last concern. What about how successful they are at recovering the filesystem?
What's the Damage? Measuring fsck Under XFS and Ext4 On Big Storage
Because of politically correct speech, I read the headline as "What's the Damage? Measuring fuck Under XFS and Ext4 On Big Storage"
Jessie Mother fscking crisko!
When I had some EBS problems a couple years ago, I figured I would run xfs_check. It seemed to do absolutely nothing, even if there were disks known to be bad in the md array. xfs is nice and fast, but I haven't seen the xfs_check or xfs_repair to do either of the things I'd assume they'd do -- check and repair. I found it easier to delete the volumes and start from scratch, because any compromised xfs filesystem seems to be totally unfixable. Is fsck for xfs new?
I do stuff Zhrodague
They're testing 70 TB of storage, so with current hard drive quality, the odds of an unrecoverable read error are probably close to 100%. It would be simpler to write a two-line fsck utility to report it:
This just in:
Full filesystem scans take longer as the size of the filesystem increases.
News at 11.
For the FSCK times of EXT4 on 50% loaded 72TB (32TB, 105million files) drive the time was only an hour. I wish my drives at home would FSCK that fast, and I only have 2 TB formatted XFS
Honey badger don't give a fsck.
A single file system that big without checking features that file systems like ZFS or clustering file stores provide seems insane to me.
I'll go tell _average joe/jane_ to go and get AIX, and dump ubuntu+unity which they like so much because it's shiny and pretty.
Not to mention the everyday low price
For justice, we must go to Don Corleone
A much better test of linux "big data"
1) write garbage to X blocks
2) run fsck if no errors found, repeat step 1
How long would it take before either of these filesystems noticed a problem and how many corrupt files do you have? With a real filesystem you should be able to identify and/or correct the data before it takes out any real data.
"If you need to fsck you should already be restoring from backups"
You do realize how long it would take to restore 72tb on the class of hardware they were testing?
OK, so I have a large x86/64 server and want to follow your advice. Can you please tell me where you can get AIX, or HP-UX, to run on X86?
I like how you completely ignored Solaris yet still presented the comment as if it was a valid counterargument.
"The more corrupt a society, the more numerous are its laws." -Tacticus
The lengthy delay in obtaining the results is due to the lack of hardware for testing time waiting for fsck to finish.
Okay, so ext4 takes longer to fsck than XFS does.
Let's look at how they set up the scenario. They made a bunch of RAID6's with two spares each, and *then* made a striped RAID of those to get 72TB. This tells me that they're storing data where uptime is paramount. So, you're not in an organization where you can answer the red phone in your server room and go "Well, we're checking the drive for errors. Our 72TB of business data will be back on line in about a half-hour". So, you've certainly got hot-spares for fail-over, right?... which means that it kinda doesn't matter *how* long your primary is down (within reason, of course). I say "within reason" because the biggest discrepancy I see in their results between ext4 and XFS is about a factor of x8 (about a half-hour for ext4 as opposed to XFS's 4.5 minutes)
Their message seems to be that, if you've got 72TB of data on an array with ext4 and your only way of getting it back is with fsck, you're in a bit of trouble.
Personally, I'd shorten the message by taking the "with ext4" part out.
JFS also works on Linux.
A cranky coward from the shadows is not s reliable source of information.
I have used AIX and Solaris, and I can say that a lot of stuff is easier on Linux.
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
What system did you end up going with?
How do you back it up?
I'll go tell _average joe/jane_ to go and get AIX, and dump ubuntu+unity which they like so much because it's shiny and pretty.
Few average Joe's have 72TB of disk space, and even for those that do, they're probably ok with 30 - 60 minutes of FSCK time. And more likely, instead of 100's of millions of files, they probably have a few million, so their fsck time will be in the 3 - 15 minute time range.
I've seen servers that take over 3 minutes for their POST check.
sudo dd if=/dev/zero of=/dev/brain
What "stuff"?
Give actual, useful comparisons.
Otherwise, your comment can be reduced to,
"I am most familiar with linux. Therefore, using linux is easier for me"
I like how you completely ignored Solaris yet still presented the comment as if it was a valid counterargument.
I also like how GP completely ignored Solaris. I just like the fact it is being ignored.
Fear is the mind killer.
He didn't type enough, zing! You really got him!
http://www.enterprisestorageforum.com/print/storage-hardware/linux-file-system-fsck-testing----the-results-are-in.html
going through 3 pages is so annoying...
Not sure how AIX will help here since it is on a similar filesystem. Also, you are comparing apples and radishes -- how does AIX compare to ubuntu+unity - one being server and other being desktop -- in other words, are you insane ?
kill -9 $$ # does the job pretty well
ZFS has 0 FSCK time as it does not need it. If you never leave your FS in an unstable state, you won't need to worry about fixing it.
killall Anonymous\ Coward
>I set up an xfs volume a couple years back. After copying a few files over nfs, it became corrupted. the xfs fsck did >something -- it told me that it was so corrupted, it couldn't be fixed.
Well, why don't you quote something from even older -- say linux 0.1 ? If that makes you feel better
XFS as a fs on linux (on SGI it was long time back, i am referring to the port) has matured way better over the years.
Also, xfs has no fsck -- sure it is not a case of mistaken identity ?
You need to use xfs_repair if *required* after dirty playback.
and FSCK has 0 jail time, unlike ZFS
...until you have a drive die during a scrub, destroy a zfs filesystem in a deduplicating zpool, or any other number of things that makes ZFS **ANGRY**, that is. and despite all that, I still trust it more than any most linux filesystems.
Each pool is a LUN that is 3.6TB in size before formatting or actually 3,347,054,592 bytes as reported by "cat /proc/partitions".
a file system with about 72TB using "df -h" or 76,982,232,064 bytes from "cat /proc/partitions"
Yeah, I think there's definitely a scaling problem there.
Or perhaps a reading comprehension problem, since /proc/partitions reports in blocks, not bytes, but either way it doesn't inspire any kind of confidence in the rest of their testing methodology.
Don't use xfs_check -- it is slow, instead run xfs_repair in -n mode
Also, there is no fsck for xfs -- for people interested in details -- it runs a playback on dirty log during a mount, a xfs_repair may be required after that but that is optional.
In other words, people who compared xfs and ext4 are not aware of this in my opinion.
Quick advice / pro-tip: Don't quote EBS and performance in same line. They don't match no matter what fs you use since the underlying medium sucks. They provide storage on 'elastic' basis -- so go figure out how fast they do it when you are writing at x MB/s
You see my nick?
AIX sucks more than Linux.
Usual process for "weird"* AIX Problems:
1) weird problem occurs after install. You report problem to IBM.
2) IBM asks for your software version, see they are the newest ones available, and say they look into it.
3) You ask several month later if they did find anything. They ask for your software version, they ask you to upgrade and see if the problem goes away.
4) You upgrade to newest version.
5) go to 2)
*There are of course non-weird problems where you get the answer from IBM support in 2-3 days, and from Linux forums in 2-3 minutes.
and XFS worked great with IRIX. WTF happed to it with lunux???
Why would you replace a zero-ed string with another? At least use /dev/random, bro.
Nerdy news for your nerdy needs? http://www.soylentnews.org Soylent News is people!
Whatever zLinux. Also, there is a point to tightly coupling the OS to the Hardware. Not every workload needs to be on x86 toys.
IBM said please don't use AIX, use Linux instead. That was like... 10+ years ago.
When an article about fsck has a tag line of "What's the damage", I expect to see some discussion of how fsck deals with a damaged file system.
The time required to fsck a file system that doesn't need checking is less interesting and inconsistant with the title. Although, if fsck had complained about the known clean file system that would be interesting.
Wasn't this linux kernel released in, like... 2008? Surely the author could have chosen a kernel at least released in 2011? Also, the tools may be just as old. An article should be surely written to be relevant to what's being presently included in an operating system.
I mean *DEBIAN* is using 2.6.32 in their current stable, due to be released soon. Usually they're years behind. Their upcoming release uses 3.2!
And speaking of that, XFS got a really major upgrade about 3.0 which essentially builds FreeBSD-style softupdates and journalling I/O intelligence to the file system.
No, you're thinking of ReiserFS.
Works best if you use the "Doom as Sys Admin" hack.
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
I expect a /. article like this to include a summary. Like, a word about what the results actually were, without having to click through twice to get to them.
A lot of stuff is also faster on Linux, particularly on the x86. Solaris x86 is dog slow. AIX ("aches") is an appropriate name for a mainframe OS that never really got the hang of this new-fangled "interactive user" stuff. It's a good mainframe OS, that is what it is designed for, tuned for and intended for, but traditional mainframe batch transactional work isn't the sort of payload that is typically run these days. The high-end users want hard real-time (i.e.: they know to the microsecond - or nanosecond, in some cases - exactly when each process will start and stop) for data collection, data analysis and simulation. The data centers want massive multithreading for gigantic servers with minimal overhead and service guarantees per thread. The typical user wants extremely low latency interactive. None of these are pre-scripted batch jobs.
Now, if you wanted to develop a data warehouse for, say, technical writings, journalism, etc, where you're compiling a collection of things that can be typeset overnight, that may be doable as a batch job. However, anyone planning on publishing a journal that needs 72 terabytes of storage had best consider the marketplace a little more closely first. A publishing company, say Nature, might conceivably have use for AIX for batch work. I could see the number of submissions, referee responses and article selections per journal being such that a mainframe would be a perfectly valid way to do things. Even then, it might still be sufficiently small that a live transactional database would be more cost-effective.
Traditionally, batch processing has been a niche market for electrical and gas companies, etc, where the number of customers is staggering. Even then, it has largely been replaced with live transactional systems because customers want things adjusted NOW and not overnight or at the end of the week.
Mass mailers still use batch processing, but printing is the bottleneck and there is no point in having an expensive OS process everything in a fraction of a second on an expensive mainframe when it takes N actual real-world seconds before a printer becomes available to take the next block of data. You need run no faster than the slowest component because the end produce won't be delivered any faster. You would have to have a gigantic number of printers before the OS became a significant factor and most shops just don't have that kind of printing power.
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
Okay.
NO CARRIER
So is this about big filesystems or lots of tiny files?
'cause they are not the same thing.
How many files is a lot? 300K? 10M? 100M?
A Pirate and a Puritan look the same on a balance sheet.
I think you're confusing AIX with S/390. AIX is IBM's Unix system, not mainframe. It handles interactive workloads just fine. Hell, S/390 does, too. Your batch processing concepts are a few decades out of date. Just sayin'.
1. Why did they put a label on the RAID devices? They should have just used /dev/sd[b-x] directly, and not confused the situation with a partition table.
2. Did they align the partitions they used to the RAID block size? They don't indicate this. If they used the default DOS disk label strategy of starting /dev/sdb1 at block 63, then their filesystem blocks were misaligned with their 128 kiB RAID block size, and one in every 32 filesystem blocks will span two disks (assuming 4 kiB filesystem blocks).
3. Why did they use md and not LVM? md can sometimes introduce bandwidth limits, and LVM lets you alternate between striped and linear volumes for your testing.
4. Why don't they report the raw bandwidth of the disk, and maybe some IOPS numbers?
5. Why don't they report total operations and bandwidth consumed as measured by iostat or sar?
6. Why didn't they give geometry hints to mkfs? The ext4 mkfs invocation, for example, should have included "-E stride=$[128 / 4],stripe-width=$[(10 - 2) * (128 / 4)]".
7. What about using an external journal?
8. They report that "during the file system check the server did not swap, and no additional use of virtual memory was observed." Wouldn't it have been better to just do "swapoff -a" and report that no swap was available?
9. Why didn't they (as someone else also suggested above) test an actually damaged filesystem?
10. Is there any indication other than their credentials that these people know what they're doing?
I am not sure it has much impact, but why would you use a 5 year old linux kernel to perform the test? Maturity is all very nice, but if you are pushing technology, it is not always the best approach.
...other file systems, such as ZFS (doesn't it work w/ Linux?), Veritas, UFS and so on?
There are of course non-weird problems where you get the answer from IBM support in 2-3 days, and from Linux forums in 2-3 minutes.
I really wouldn't paint Linux support in such rosy terms. Many forums are heading in the direction of the blind leading the blind; application-specific mailing lists and IRC channels, while improving, still have a slight tendency to say "RTFM n00b!". (Or, as happened to me, "Can't be done. It's a stupid demand anyway. Fuck off" - twenty minutes later I figured out how to do it on my own, so it evidently could be done...)
Thank goodness someone has actually posted something relatively negative about ZFS. The way many people rave about it, you'd think it was God's gift to filesystems.
Ironically, that has made me more interested in using it. My general instinct is to distrust anything that is painted as all good.
OK, so I have a large x86/64 server and want to follow your advice. Can you please tell me where you can get AIX, or HP-UX, to run on X86?
Right. Very funny how you managed to pick out the two systems that don't run on x6 out of the three. If your question was even remotely serious there are two options for you: Solaris and FreeBSD.