Slashdot Mirror


A Short History of Btrfs

diegocgteleline.es writes "Valerie Aurora, a Linux file system developer and ex-ZFS designer, has posted an article with great insight on how Btrfs, the file system that will replace Ext4, was created and how it works. Quoting: 'When it comes to file systems, it's hard to tell truth from rumor from vile slander: the code is so complex, the personalities are so exaggerated, and the users are so angry when they lose their data. You can't even settle things with a battle of the benchmarks: file system workloads vary so wildly that you can make a plausible argument for why any benchmark is either totally irrelevant or crucially important. ... we'll take a behind-the-scenes look at the design and development of Btrfs on many levels — technical, political, personal — and trace it from its origins at a workshop to its current position as Linus's root file system.'"

35 of 241 comments (clear)

  1. Looks promising by PhunkySchtuff · · Score: 5, Informative

    This looks like a promising filesystem - as ZFS on linux is, at present, doomed to die an ugly death, btrfs looks to address a lot of the shortcomings of other filesystems and bring a clean, modern fs to linux. It goes beyond ZFS in some areas too, such as being able to efficiently shrink a filesystem, and keeps a lot of the cool things that ZFS made popular, such as Copy-On-Write.

    It looks like Btrfs also addresses some decisions that were made with the direction that ZFS would be going in, or how it would handle certain problems that now with hindsight behind the developers, they possibly would have done things differently.

    Apple are really struggling with ZFS, with it being announced as a feature in early betas of both Leopard (10.5) and Snow Leopard (10.6), as well as being there in a very limited form in Tiger (10.4) - maybe development on Btrfs will leapfrog ZFS for consumer-grade hardware and Apple can finally look at deprecating HFS.

    1. Re:Looks promising by dirtyhippie · · Score: 2, Informative

      ... but btrfs is GPL. Therefore Apple can't use it, unless perhaps they are able to work out licensing from Oracle.

    2. Re:Looks promising by PhunkySchtuff · · Score: 5, Informative

      Apple has, and does, use GPL'd code and complies with the terms of the license.

      Take, for example, WebKit, which is a fork of KHTML. It's now released as LGPL:
      http://webkit.org/coding/lgpl-license.html

      This code powers the browser that Apple ship with Mac OS X, Safari - which is arguably one of the most important pieces of code in the whole OS.

      As a result of it's quality, speed and standards adherence, it's now used by companies like Nokia and Adobe...

    3. Re:Looks promising by aj50 · · Score: 2, Informative

      I think that since it's a part of the kernel, it would count as a derivative work which would mean the whole kernel would have to be GPL'd as well.

      This is similar to the reason that ZFS can't just be ported to linux, the code is under CDDL which is incompatible with GPL.

      --
      I wish to remain anomalous
    4. Re:Looks promising by TheRaven64 · · Score: 3, Informative

      The GPL and LGPL are very different. The LGPL does not affect any code beyond that originally covered by the license. You can link LGPL'd WebKit against proprietary-licensed Safari with no problems.

      Apple also ship GPL'd software like bash, but they don't link it against any of their own code.

      Linking GPL'd code into the kernel would require the rest of the kernel to be released under a license that places no restrictions that are not found in the GPL. That's not a problem for Apple's code; they own the copyright and they can release it under any license they choose. It would be a massive problem for third-party components. DTrace, for example, is heavily integrated into OS X's Instruments developer app and is CDDL (GPL-incompatible). Various drivers are under whatever license the manufacturers want, and are mostly GPL-incompatible. A GPL'd component would need to be very compelling to make Apple rewrite DTrace, most of their drivers, and a lot of other components. Btrfs is not this compelling. Even if Btrfs were sufficiently good, it would take less effort for them to just completely rewrite it than to rewrite all of the GPL-incompatible components.

      --
      I am TheRaven on Soylent News
  2. So, by Josh04 · · Score: 2, Insightful

    Is this ever going to replace ext4? The ext series of file systems are 'good enough' for most people, so unless it has some epic benchmarks I can't imagine a huge rush to reformat. Maybe that's what drives file system programmers insane. The knowledge that for the most part, it's going nowhere. FAT12 is still in use, for Christ's sake.

    1. Re:So, by PhunkySchtuff · · Score: 5, Interesting

      Aside from Copy on Write, one other feature that this filesystem has that I would consider essential in a modern filesystem is full checksumming. As drives get larger and larger, the chance of a random undetected error on write increases and having full checksums on every block of data that gets written to the drive means that when something is written, I know it's written. It also means that when I read something back from the disk, I know that it was the data that was put there and didn't get silently corrupted by the [sata controller | dodgy cable | cosmic rays] on the way to the disk and back.

    2. Re:So, by borizz · · Score: 4, Insightful

      Snapshots are nice too. Makes stuff like Time Machine and derivatives much more elegant. ZFS has built in RAID support (which, I assume, works on the block level, instead of on the disk level), maybe Btrfs will get this too.

    3. Re:So, by joib · · Score: 4, Informative


      ZFS has built in RAID support (which, I assume, works on the block level, instead of on the disk level), maybe Btrfs will get this too.

      Yes, btrfs currently has built-in support for raid 0/1/10, 5 and 6 are under development.

    4. Re:So, by aj50 · · Score: 2, Insightful

      I had this exact problem very recently.

      If my data was important, I should have been using ECC RAM.

      --
      I wish to remain anomalous
    5. Re:So, by PhunkySchtuff · · Score: 4, Informative

      What you do know is that when you read a block of data back from the disk, that block is what was supposed to be written to the disk.

      If a file that is never read is corrupted somehow, then you will only discover that corruption when you read the file.

      Having checksums is very good if you have a RAID-1 mirror. With full block checksums, you can read each half of the mirror and if there is an error, you know which one is correct, and which one isn't. At present, if a RAID-1 mirror has a soft error like this, due to corruption, you don't know which half of the mirror is actually correct.

      With ZFS, for instance, you can create a 2-disk RAID-1 mirror and then use dd to write zeroes to one half of the mirror, at the raw device level (ie, bypassing the filesystem layer) and when you go to read that data back from the mirror, ZFS knows that it's invalid and instead uses the other side of the mirror. It then has an option to resilver the mirror and write the valid data back to the broken half, if you so want.

    6. Re:So, by AzureDiamond · · Score: 2, Informative

      As far as I know, the Windows IFS development kit is not free, neither as in speech nor as in beer.

      You can download the Windows Driver Kit for free, and that includes the Installable Filesystem Kit headers and libraries and the source code to FASTFAT.SYS, the Windows FAT driver and CDFS.SYS, the ISO9660/Joliet filesystem.

      http://www.microsoft.com/whdc/DevTools/WDK/WDKpkg.mspx

      That being said writing a performance filesystem for Windows is much less easy than for Linux.

    7. Re:So, by borizz · · Score: 2, Interesting

      Odds are the checksum then won't match anymore and you'll be notified. It's better than silent corruption.

  3. Re:oh wee sun's sloppy seconds. by Anonymous Coward · · Score: 2, Funny

    Will you also be enjoying your media in REAL PLAYER?

  4. So, what is the status of btrfs? by MMC+Monster · · Score: 4, Interesting

    Is it Beta? The fact that Linus runs it as his root fs doesn't tell me much. Now, if you told me that's what he uses for ~/, I would be more impressed.

    The important question to me is, how long 'til it gets in the major distributions?

    --
    Help! I'm a slashdot refugee.
    1. Re:So, what is the status of btrfs? by joib · · Score: 4, Informative
      The important question to me is, how long 'til it gets in the major distributions?

      The article predicts a couple of years until it's safe enough as default in new distros.

    2. Re:So, what is the status of btrfs? by TheRaven64 · · Score: 5, Interesting

      Meanwhile, FreeBSD and OpenSolaris are shipping with a version of ZFS that is usable now...

      --
      I am TheRaven on Soylent News
    3. Re:So, what is the status of btrfs? by joib · · Score: 5, Informative

      Just because a replied to your snarky message with another equally snarky one, doesn't mean I'm not able to put it into words. For instance, a few reasons why I prefer Linux over *BSD or Solaris:

      - better package management

      - better hw support

      - better ISV support

      - the uncertain future of Solaris (after all, Sun got bought because they were bleeding red ink left and right, will the Solaris devs escape the inevitable layoffs and Oracle continue pumping money into Solaris development just to try to keep up with Linux?)

      - Lack of tier-1 commercial support for *BSD.

      - Much larger community

      - Better availability of qualified Linux sysadmins

    4. Re:So, what is the status of btrfs? by asaul · · Score: 4, Informative

      For hardware support it really depends what segment of the market you are arguing about. If you are talking white box, low end mostly self supported stuff then no doubt, Linux wins hands down. But as a sysadmin I find Linux to be the of the most painful platform to work on compared to Solaris or AIX - predominantly because of the lack of standardised, stable and properly supported management interfaces.

      Fibre channel support is a joke. Sure, for the most part you can dynamically bring stuff in and out, and udev goes a short way to bringing some consistancy. The problem is when something goes wrong you are left with pretty much just rebooting - messages tell you nothing - is the device there or not? Usable details are buried away in /proc and /sys and typically are only useful for developers. Solaris and AIX had cfgadm/cfgmgr and lsdev and friends to tell you what state things are in or what has happened. There are useful and informative error messages (typically). So far on RHEL 3/4/5 all I ever see is odd octal dumps from drivers when errors occur, and wierd hangs and IO errors when devices get broken. It gets worse as you change fibre drivers and versions. Options which exist in one disappear in others. Vendor drivers add customisations which cause other issues.

      The lack of stablity in terms of being able to do things between versions gets me as well. On AIX/Solaris you write a script for Solaris 8, and it just works going forwards to other versions. Solaris 10 changes things a bit, but for the most part you can still poke around the same places or the same way to get info back. In short they tend not to break things that work.

      Linux goes the other way - a change is made, and thats that, it seems to be up to you to either track or figure it out. You find yourself having to customise things for many many variations of platform - not just major versions, but minor versions as well. Changes to config file locations, the ways those files are defined etc.

      Don't get me wrong, I got into UNIX on Linux and I wont dispute its strength in drivers or community, but that community is not "Enterprise" focused. Its why I use it for my PVR and not my file server. The rapid changes in Linux are why the DVB-T cards I got became supported so quickly after the hardware changed. I get the differences, but its not one size fits all.

      --
      "If everybody is thinking alike, somebody isn't thinking" - Gen. George S. Patton
    5. Re:So, what is the status of btrfs? by ion.simon.c · · Score: 2, Interesting

      Yeah, right.

      I want to see you back out a series of patches on Linux and revert to the previous configuration because the updates broke something.

      # echo =package-cat/package-offending-version >> /etc/portage/package.mask
      # emerge -C =package-cat/package-offending-version && emerge package-cat/package

      Rinse and repeat for any other packages which may be borked.

    6. Re:So, what is the status of btrfs? by ion.simon.c · · Score: 3, Informative

      Fail.

      1. Not available on the majority of Linux installations

      Something similar seems to be available in APT:
      http://www.debian.org/doc/manuals/apt-howto/ch-apt-get.en.html
      Check section 3.10.
      And here's the rough equivalent for RPM:
      http://www.linuxjournal.com/article/7034

      So, what distro is no longer covered?

      2. Removing a package is not the same as reverting to an earlier version of the same package.

      I guess that you missed the latter half of the last command that I posted:

      # emerge -C =package-cat/package-offending-version && emerge package-cat/package

      An English translation of that command is

      Remove the offending package and install the latest available that's not masked if the removal was successful.

      I could have written that command as:

      # emerge package-cat/package

      and -as I had previously masked the offending package version- Portage would have done the right thing.

      So, in summary:

      No, you're a towel.

      :D

  5. Re:Linus', not Linus's. by WillKemp · · Score: 2, Interesting

    There doesn't seem to be any hard and fast rules about anything in British english! ;-)

    In Fowler's Modern English Usage, which is generally considered to be the bible of english usage by UK journalists and writers, there's an article called "Possessive Puzzles". In that, he says it was "formerly customary" to drop the last 's', but not any more.

    If it was formerly customary in Fowler's day, i reckon it must be well and truly archaic now.

  6. Re:oh wee sun's sloppy seconds. by mysidia · · Score: 2, Insightful

    That argument isn't actually based on the technical merits, and thus doesn't make any sense..

    Just because a Real OS features a Real FS backed up by a real company, doesn't necessarily mean the FS or OS are any good on technical merits compared to a REAL project licensed under a REAL free software license backed up by a REAL community and supported by a REAL foundation.

  7. Oh great by teslatug · · Score: 5, Funny

    As if fsck wasn't bad enough to use in business talks, now I have to get prepared for btrfsck

    1. Re:Oh great by toby · · Score: 4, Interesting

      I'd rephrase that. It eliminates the common cases where you'd need fsck on a conventional filesystem.

      ZFS' design makes consistency failure extremely unlikely. I understand why they claim it doesn't need fsck ("always consistent on disk"). There is controversy over whether there should be a scavenging tool. Some people want one for peace of mind.

      But again, most cases of ZFS pool loss where some believe a scavenger may have saved them, may actually have been solved by more aggressive rollback (I believe work is being done on this).

      Anyone interested in this issue should follow the ZFS mailing list.

      --
      you had me at #!
  8. Re:oh wee sun's sloppy seconds. by mrsteveman1 · · Score: 2, Funny

    Maybe someday you'll be a Real Boy

  9. Re:total gay by Runaway1956 · · Score: 2, Insightful

    I wish the parent hadn't been modded down. He makes a point that should be addressed.

    I've lost data on every file system that I've ever used, including NTFS, and the highly touted ReiserFS. Nothing guarantees the security of your data. The nearest you can come to data security, is to backup, backup, and backup again. Those people and organizations that keep regular backups seldom lose data. However, even those people can lose data in the event of a physical disaster (fire, flood, theft, being hit by a humongous meteorite) which is why off-site backups are important.

    That said - IMHO, a journaling file system is an important first step to data security. NTFS and Ext3 are about equal, in my experience. Turning off caching features is an important second step. A power outage before data is written to disk, and/or while data is being written results in corruption in all current file systems. The important thing is, if data is mission critical, you want it written IMMEDIATELY, not floating around in RAM.

    And, finally, you NEED redundant backups. Anyone who fails to make backups WILL LOSE data, eventually.

    --
    "Windows is like the faint smell of piss in a subway: it's there, and there's nothing you can do about it." - Charlie Br
  10. Meh by Dachannien · · Score: 5, Funny

    Who cares? In a few years' time, this will be obsoleted by its successor, icantbelieveitsnotbtrfs.

  11. Re:Yet another "modern" FS without undelete... by ultranova · · Score: 2, Insightful

    We all know that the data is not zeroed on deletion, so why can't we have a File System that (preferably after fs umount) can scan the blocks and retrieve any file whose data blocks have not been overwritten yet, even if it takes a lengthy whole disk surface scan.

    Why would you use such shenigans? Simply make the filesystem mark deleted files as "hide from directory listing, and really delete only if you need the space". Then add a couple of syscalls to examine these "recyclable" files and restore them to normal status.

    Now, there are a number of corner cases that need to be thought out - such as what happens if you delete a file/directory and then create a new one with the same name - but the principle is simple enough: don't really delete files, merely mark them as deletable/recyclable/harvestable/condemned/dying.

    --

    Forget magic. Any technology distinguishable from divine power is insufficiently advanced.

  12. Re:Linus', not Linus's. by siride · · Score: 2, Insightful

    In spoken English, you generally pronounce the second 's' (unless you are a pedant of some sort), so it would stand to reason that the second 's' should remain. There is another motivation: the "'s" is actually a clitic that attaches to phrases (usually noun phrases) and is thus a separate word, not a part of the word it is attached to. As such, it should always be spelled out (as it is always pronounced).

  13. Re:oh wee sun's sloppy seconds. by mysidia · · Score: 2, Informative

    No. I was alluding to GP's failure to make a good argument for supporting ZFS, using sound reasoning. I'm saying the truth of the premises doesn't imply the truth of the consequent.

    ZFS on supported hardware is actually superior to ext4 on certain technical merits; primarily data integrity (checksumming), random write performance, and read performance (when massive amounts of RAM are available), and more advanced features (snapshots).

    On the other hand, ext4 works on 32-bit processors (ZFS is only recommended to be used on 64-bit procs) with small amounts of RAM available, less than a GB; the minimum amount of RAM one should use ZFS with is 2gb, and 4gb or more is strongly recommended, above and beyond any RAM required by apps running on the machine.

    But that has little to do with the OS being produced by a large corporation.

  14. Re:Yet another "modern" FS without undelete... by OrangeTide · · Score: 2, Interesting

    I undelete stuff all the time on Linux. you just open the trash and pull the stuff out. Once you empty the trash it is gone though. If you're using a command-line and 'rm' stuff though, that's entirely your fault for using such a low-level power-user interface for file management.

    There are serious performance consequences and fragmentation consequences of supporting undelete at the filesystem level. But supporting snapshots is something high performance filesystems do, and snapshots are way more useful than undelete. Especially if snapshots are cheap enough to make them automated. Imagine having 24 revisions of your filesystem of the last 24 hours. This is done all the time on real Filers. I love it when my home directories at work are snapshot this way, makes it super easy to recover screwed up source code due to my inability to check in source before I make huge changes to it.

    I think we should demand that Linux get snapshot support that is generally available (like default on RHEL,SuSE,Ubuntu,etc.) It's a feature that has been missing from Linux. While things like FreeBSD have had it standard for many years now.

    --
    “Common sense is not so common.” — Voltaire
  15. Re:oh wee sun's sloppy seconds. by Runaway1956 · · Score: 2, Funny

    Ahhh, but, it seems that you have assumed AC to be human. ;-)

    --
    "Windows is like the faint smell of piss in a subway: it's there, and there's nothing you can do about it." - Charlie Br
  16. Re:Duh... by caseih · · Score: 5, Informative

    Wow. FUD flies fast and hard on slashdot. Zealots? Are you serious? Rather than mod your post as +1 Funny, I think I'll blow some karma and respond, just to set the record straight.

    Laying aside misconceptions about the GPL, the main reason BtrFS is GPL is because it's part of the Linux kernel which is also GPL! How hard is it to grasp that? If Apple or anyone else wants to license Oracle's BtrFS code, they are welcome to negotiate and get the code under a different license than the GPL. It's that simple. BtrFS is an implementation of an idea, a specification. If Apple wants to write their own BtrFS driver, they are welcome to do that. Or Microsoft.

    Why are developers who don't want their code to be ripped off (used without payment in a closed product) by companies and incorporated into a product are labeled zealots? How is this different than software companies requiring code to be licensed by third parties? So a company who creates some really cool technology that they license for a fee to others for use in products zealots? There really is no difference.

    While I haven't written any software of note, I also use the GPLv2 (evaluating v3) since I want my software to be able to be freely used by those that want to use it, but if my code is that valuable to a company, I want to get paid for my trouble. If no one is willing to pay me, then that's fine. They are welcome to use my software without restriction, but if they redistribute it, to do so under the terms of the GPL. Guess that makes me a zealot.

  17. Re:Duh... by evilviper · · Score: 2, Interesting

    Why are developers who don't want their code to be ripped off (used without payment in a closed product) by companies and incorporated into a product are labeled zealots?

    Perhaps because they are writing software which is by FAR most useful when it is used as far and wide as possible, while using a license which makes that goal extremely difficult to achieve, unnecessarily.

    Honestly, the only reason anyone cares about Btrfs is because the license on ZFS is too restrictive for inclusion in Linux, and NOBODY has opted to write their own implementation under a GPL or other, freer, license.

    --
    Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant