Slashdot Mirror


What's the Damage? Measuring fsck Under XFS and Ext4 On Big Storage

An anonymous reader writes "Enterprise Storage Forum's long-awaited Linux file system Fsck testing is finally complete. Find out just how bad the Linux file system scaling problem really is."

7 of 196 comments (clear)

  1. Re:fsck speed, want safety by h4rr4r · · Score: 5, Insightful

    If you need to fsck you should already be restoring from backups onto another machine.

  2. Re:fsck speed, want safety by darkpixel2k · · Score: 5, Funny

    when I need to fsck, I just call my girlfriend

    Why? Do you not know how to use the command line?

    --
    There's no place like ::1 (I've completed my transition to IPv6)
  3. Re:linux is fail by Anonymous Coward · · Score: 5, Funny
    sudo kill yourself

    ;-)

  4. Re:fsck speed, want safety by hackstraw · · Score: 5, Interesting

    The largest filesystem I admin is just shy of 1/2 petabyte. And its one in number. Backing up everything on that filesystem is simply not feasible. To put it in perspective 1 stream @ 200 MiB/s would take almost 28 days to backup the whole thing. I would imagine a restore would take about the same order. Telling hundreds of users their files are unavailable for reading or writing for 30 days is not really an option, so I run fsck.

    Backups simply are not really an option past 20+ terabytes of storage, and simply not feasible if the storage is volatile in nature. AFAIK everyone has gone to redundancy over backups at scale.

  5. Re:fsck speed, want safety by chuckymonkey · · Score: 5, Insightful

    You're fairly wrong there, you can actually back that much data up. You just have to be willing to pay for some seriously large tape libraries and they're not cheap. We're in the process of installing a 700TB array with a 1.5PB tape library backup. You just have to do the backups using filesystem snapshots and run them pretty much constantly.

    --
    "Some books contain the machinery required to create and sustain universes."-Tycho
  6. Re:fsck speed, want safety by lvxferre · · Score: 5, Funny

    Protip: if 'make love' returns no target, you need to do the job by hand.

    --
    Nerdy news for your nerdy needs? http://www.soylentnews.org Soylent News is people!
  7. Re:linux is fail by jd · · Score: 5, Interesting

    A lot of stuff is also faster on Linux, particularly on the x86. Solaris x86 is dog slow. AIX ("aches") is an appropriate name for a mainframe OS that never really got the hang of this new-fangled "interactive user" stuff. It's a good mainframe OS, that is what it is designed for, tuned for and intended for, but traditional mainframe batch transactional work isn't the sort of payload that is typically run these days. The high-end users want hard real-time (i.e.: they know to the microsecond - or nanosecond, in some cases - exactly when each process will start and stop) for data collection, data analysis and simulation. The data centers want massive multithreading for gigantic servers with minimal overhead and service guarantees per thread. The typical user wants extremely low latency interactive. None of these are pre-scripted batch jobs.

    Now, if you wanted to develop a data warehouse for, say, technical writings, journalism, etc, where you're compiling a collection of things that can be typeset overnight, that may be doable as a batch job. However, anyone planning on publishing a journal that needs 72 terabytes of storage had best consider the marketplace a little more closely first. A publishing company, say Nature, might conceivably have use for AIX for batch work. I could see the number of submissions, referee responses and article selections per journal being such that a mainframe would be a perfectly valid way to do things. Even then, it might still be sufficiently small that a live transactional database would be more cost-effective.

    Traditionally, batch processing has been a niche market for electrical and gas companies, etc, where the number of customers is staggering. Even then, it has largely been replaced with live transactional systems because customers want things adjusted NOW and not overnight or at the end of the week.

    Mass mailers still use batch processing, but printing is the bottleneck and there is no point in having an expensive OS process everything in a fraction of a second on an expensive mainframe when it takes N actual real-world seconds before a printer becomes available to take the next block of data. You need run no faster than the slowest component because the end produce won't be delivered any faster. You would have to have a gigantic number of printers before the OS became a significant factor and most shops just don't have that kind of printing power.

    --
    It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)