Btrfs Is Getting There, But Not Quite Ready For Production
An anonymous reader writes "Btrfs is the next-gen filesystem for Linux, likely to replace ext3 and ext4 in coming years. Btrfs offers many compelling new features and development proceeds apace, but many users still aren't sure whether it's 'ready enough' to entrust their data to. Anchor, a webhosting company, reports on trying it out, with mixed feelings. Their opinion: worth a look-in for most systems, but too risky for frontline production servers. The writeup includes a few nasty caveats that will bite you on serious deployments."
It says "experimental." They appreciate you helping them test their file system out. I appreciate it too, so please do. But remember that you are testing an experimental filesystem. When it eats your data, make sure you report it and have backups.
I've been happily using the XFS file system since the early-to-mid-2000s and have never had a problem. It is rock solid and much faster than ext3/ext4 in my experience, tested a lot longer than Btrfs, and handles the millions and millions of small files on redditmirror.cc very effectively.
Slashdot Valentines Beta Massacre: iT WORKED! The boycotts killed Beta!!
I think we need to talk about the oracle in the woodpile - ie, Oracle. BTRFS is an Oracle project. What happens when it goes the way of MySQL? Will Monty Wideanus appear on a white steed to save us?
Do you even lift?
These aren't the 'roids you're looking for.
Lots of production servers user Ext filesystems. If btrfs is all it should be it will certainly replace these file systems one day soon as the safe choice.
Sure people use other filesystems on production Linux servers, but those are not the norm. The safe "Enterprise" (Not necessarily a good thing) choice is still Ext based filesystems.
Meanwhile ZFS announced that it was ready for production last month.
http://zfsonlinux.org/
Isn't that a common trait with experimental systems?
Ugh, I'm really sorry about this post, Slashdot. I really didn't think it was going to a "First post." What I really meant to post was
ZFS is outside the kernel tree. That is not an ideological issue, but a practical one. It means updates will not come from the normal channels, it means kernel updates form normal channels could break it and it is not getting the attention from the kernel devs an fs should get.
ZFS on linux has probably less testing than Btrfs at this point. It has near no real world testing. Just because the Solaris ZFS is great, and the BSD one is coming along means nothing for the stability and correctness of the Linux port.
If you want to use a different OS than this entire discussion is worthless. You might as well suggest switching everything to OSX and using HFS+.
Those distros such as SuSE Linux Enterprise Server, that claim it was production ready and have it in the install, should be shunned. Don't entrust your data to them
yea Btrfs has one major bug
if you fill the hard drive up you lose access to the system, you can't log in or even get access to the filesystem and the system locks up
with ext things may act a bit erratic but you could log in and delete/move things off to make room and be ok. but Btrfs you can't if it fills up you lose
unless you take the hard drive out move it to another box and mount it then delete crap that way, but that's a pain in arse.
Ext3 is still chugging along and doing what you want. A filesystem that sacrifices everything for stability.
Not everyone has the same wants and needs. Lots of competing filesystems is a good thing, it leads to a market of ideas. Your lets pick one and force everyone to suffer with our choice just leads to stagnation and even worse results.
So what is this 'Full' limit? In the ZFS world it's accepted to keep the pool (volume) under 80% usage to prevent issues. Is this something that should be applied as a 'best practice' to btrfs?
Is part of the open source karma. "Shiny and New" is much more important than "stable, bug-free and usable"
The problem with "XFS" eating data wasn't with XFS - it was with the Linux devmapper ignoring filesystem barrier requests.
Gotta love this code:
Martin Steigerwald wrote:
> Hello!
>
> Are write barriers over device mapper supported or not?
Nope.
see dm_request(): /*
* There is no use in forwarding any barrier request since we can't
* guarantee it is (or can be) handled by the targets correctly.
*/
if (unlikely(bio_barrier(bio))) {
bio_endio(bio, -EOPNOTSUPP);
return 0;
}
Who's the clown who thought THAT was acceptable? WHAT. THE. FUCK?!?!?!
And it wasn't just devmapper that had such a childish attitude towards file system barriers:
Andrew Morton's response tells a lot about why this default is set the way it is:
Last time this came up lots of workloads slowed down by 30% so I dropped the patches in horror. I just don't think we can quietly go and slow everyone's machines down by this much...
There are no happy solutions here, and I'm inclined to let this dog remain asleep and continue to leave it up to distributors to decide what their default should be.
So barriers are disabled by default because they have a serious impact on performance. And, beyond that, the fact is that people get away with running their filesystems without using barriers. Reports of ext3 filesystem corruption are few and far between.
It turns out that the "getting away with it" factor is not just luck. Ted Ts'o explains what's going on: the journal on ext3/ext4 filesystems is normally contiguous on the physical media. The filesystem code tries to create it that way, and, since the journal is normally created at the same time as the filesystem itself, contiguous space is easy to come by. Keeping the journal together will be good for performance, but it also helps to prevent reordering. In normal usage, the commit record will land on the block just after the rest of the journal data, so there is no reason for the drive to reorder things. The commit record will naturally be written just after all of the other journal log data has made it to the media.
I love that italicized part. "OMG! Data integrity causes a performance hit! Screw data integerity! We won't be able to brag that we're faster than Solaris!"
See also http://www.redhat.com/archives/rhl-devel-list/2008-June/msg00560.html
There's a lot more out there if you care to look.
Toss in other things like the way Linux handles NFSv2 group membership (More than 16? Let's just silently drop some!) and lots of fanbois wonder why I view Linux as little better than Windows. Hell, Microsoft may fuck things up six ways from Sunday, but they're not CHILDISH when it comes to things like data integrity.
The primary reason for the existence of ext4 is Lustre. By far the best option for a general purpose none clustered Linux file system is XFS by some considerable distance. The crying shame is that RedHat did not make a grab for CXFS out the ruins of SGI but persisted with GFS2 and then purchased Glustre.
Want more than 16TB on your server? Unless ext4 has very recently grown that support then using an ext based file system is not viable. Remember a RAID5 in 4D+P using 4TB disks will be super close to that 16TB limit. Better hope that you don't want to scale the file system up in the future.
Friends don't let friends use RAID5.
As far as I know there are no 4TB SAS drives available yet.
This is another reason why people want btrfs soon. Right now it is not yet an issue, for most use cases. Since you can have many 16TB volumes.
Actually I'm being serious. This is why I come to /.
Python: 'And then suddenly you have a language which says "we're all stuck with whatever the whiniest coder wants".'
My unraid system still sticks with reiserfs. Been using it for years with no obvious issues seen, the unraid application author still seems to prefer it over any other FS.
They fix that write barrier issue yet?
Don't tell me it is a Linux bug, that is a cop out. Either it will lose my data or it will not, I don't care why.
Linux DOES have a good file system. As xkcd says; it has 15 of 'em. This is one of the highlights of Linux, you can use any fs available. The bad side is simply many of them are old, a few are too new, and most don't have ALL the features enterprise's need. This is a good thing though. When Apple or Microsoft delcare a critical system infrastructure feature complete, and its not for you, what is your recourse? At least with Linux if something new is really needed it can be added. The possibility is there.
Python: 'And then suddenly you have a language which says "we're all stuck with whatever the whiniest coder wants".'
FYI, ext4 can be larger than 16 TB but you need a newer version of the e2fsprogs than is included in a typical enterprise distribution. It's not the kernel filesystem drivers with the limitation, but the user-level utility for formatting a new filesystem.
I still prefer XFS ;-).
And on the Eighth Day, Man created God.
Ext3 is still chugging along and doing what you want. A filesystem that sacrifices everything for stability.
EXT3, is actually fairly good, and the performance isn't bad _EXCEPT_ for one issue. fsync(), which causes a massive IO barrier against all the other operations in the filesystem. fsync() should only be assuring the named file is consistent, and yet it basically stalls the entire FS to assure that one file. Its a problem with lack of proper IO tagging and actually is a fundamental problem with the block layer in linux. A recent LSML posting about SYNCHRONIZE CACHE hints at the problem too (complete device flush when only a small portion of the IO needs to be flushed).
99% of software doesn't need to call fsync() on a sanely designed filesystem. The most likely problem is software which calls fsync() regularly to work around ext4 retardeness then being run on ext3, or apps which use libraries like sqlite that call fsync() multiple times when updating the database.
Certainly when I manually sync on my CentOS machine it takes several seconds to complete the writes to disk, so clearly the software I run there isn't calling fsync() much.
Installed Xubuntu 12.10 last October(ish) on USB2 stick (jetflash 32G) with Btrfs (only /boot had EXT2 partition, no swap)
Reason: 24/7 machine. It's a notebook - always spinning harddrive is a drag: spins up cooling fun; so I went solid state for primary OS drive.Needed filesystem that spreads wear and does checksums - hence Btrfs.
Usage - downloading stuff (to the stick itself, not the harddrive) plus some NASing. Data volume: wrapped around those 32gigs few times already.
Observations so far: no problems at all.
Other details: Had to play with I/O scheduler (I think settled on CFQ. Interestingly, NOOP sucked). Had to install hdidle (I think) otherwise couldn't force sda to go to sleep (bug (?)).
'Licensing crap' is legal, not ideological.
I've just lost ( an our ago ) the entire FS tree in archlinux installation ( I forgot to prepare meself with the btrfsprogs ) - I thought I was safe to use btrfs - I wasn't even able to boot ubuntu-live (13.04) because [the screwed] btrfs partitions made the kernel (btrfs module from the Ubuntu live) crash at boot :-)
- destroy offeding partition
- restart from scratch but NEVER-EVER use btrfs again!
I have my lesson - don't touch anythings you do NOT know very well. :-)
Therefore, it is a pleasure for me to re-install archlinux ( using ext4 this time!!! ) again
Long live archlinux :-)
hahahaha
Well, if linux is a toy (your basic argument) then why are all the subsystem maintainers paid by large companies a salary same as the developers at Microsoft, or any other OS company?
My point doesn't preclude people showing up and writing the next great filesystem. Its simply a question of why everyone thinks its a good idea for a guy PAID to maintain a filesystem to drop it and go write another one. If you worked for _BIG_ company and were paid to maintain their application, and you decided one day that maintaining their application was a PITA cause it was old crufty and not sexy anymore and instead refused to fix problems in it, rather spending the next 4 years writing a replacement (complete with another set of bugs) how long do you think your job would last?
Of course, this stuff happens in a lot of software projects, new developer shows up, and writes buggy new system cause they think they are smarter than the last guy. It frankly speaks of immaturity and an "artist" mentality rather than an engineering process. Sure, software isn't all engineering but linux is an OS, its a fundamental part of a computing platform and one that is expected to provide some basic level of service to applications (you know the things actually doing the work). When it fails at that, because its a patchwork of art, then you have to question why.
How about Dell or HP or someone like that?
Neat to see they are coming along though. Not too bad to get 16TiB for $3500. Since you need 8 of them to get a RAID10 that size.
99% of software doesn't need to call fsync() on a sanely designed filesystem.
Really? please, show me the part of POSIX which says the data you wrote has now been flushed to the medium and you can respond with 100% certainty, to the user, or API making a request that if power fails this transaction will be safe.
I don't see how that's any different: "Show stoppers" means "things that are unacceptable in our scenario".
Please mod parent informative.
One of the retarded things about btrfs is that you can not see how much disk space is being used by each subvolume. How the hell can you have a filesystem and not know how much space is in use or free ??
The design of ZFS is much more wholistic. That is, when we take a step back and look at both the micro and macro we see that we are really trying to solve 3 problems:
* Volume Management
* File System
* Data Integrity
ZFS solves all of these be leveraging knowledge from ALL the layers as one cohesive whole.
https://blogs.oracle.com/bonwick/en_US/entry/rampant_layering_violation
Why RAID is fundamentally broken
https://blogs.oracle.com/bonwick/entry/raid_z
Another interesting doc
http://www.scribd.com/doc/43973847/5/ZFS-Design-Principles
I tried btrfs as my main laptop filesystem:
nice features, speed ok, but i happened to unplug by mistake the power supply, without a battery. bad crash... I tried using btrfsck, and other debug tools, even in the "dangerdon'teveruse" git branch, they just segfaulted. at the end my filesystem was unrecoverable, I used btrfs-restore, only to find out that 90% of my files had been truncated to 0... even files i didn't use for months....
now, maybe it was the compress=lzo option, or maybe I played a little too much with the repair tools (possible), but untill btrfs can sustain power drops without problems, and the repair tools at least do not segfault, I won't use it for my main filesystem...
btrfs is supposed to save a consistent state every 30 seconds, so I don't understand how I messed up that bad.... maybe the superblock was gone and the btrfsck --repair borked everything, I don't know.... luckily for me: backups :)
"I was gratified to be able to answer promptly, and I did. I said I didn't know." -- Mark Twain
1. One size doesn't fit all though.
Most filesystems aside from ZFS sacrifice correctness for the sake of performance.
* For enterprise correctness is more important then performance.
* For home use performance is more important then correctness.
2. You seem to be ignoring history.
As we've gone from 32-bit to 64-bit CPUs filesystems have likewise gone from 32-bit, 64-bit, and 128-bit.
Remember software (and hardware) is about engineering tradeoffs between 2 extremes:
Correct but Slow < - - - and - - - > Fast but Unstable
--
Only Cowards use Censorship.
To be honest, there are many projects that are just this - a rewrite of working code just because the license doesn't match what you want. BDB => GDBM for some reason pops in the mind first. Usually it's mostly a waste of resources as it takes time to build up the feature set of the copied code and avoid the bugs that were revisited because they ignored the design of the copied code. I'm still waiting for my FSF Skype clone.
My guess is that humans want to be architects, not maintainers. It's fun to be bold and create "new" things with the partial safety of it following a known framework than go and try to fix that annoying bug in someone else's code that only shows up on Toshiba hardware with the 2976G chipset and NOT the 2976F chipset and when Obama wears a red tie. This is not of course all of it, there are some legit license reasons for some forks, but underneath methinks this is always a secondary reason.
ext4 is usually has better performance, in recent versions of the Linux kernel I believe the ext4-code is used for ext3 and ext2 as well.
New things are always on the horizon
The problem is exactly that. We have 15 different file systems but none of them is really complete and free from obvious bugs. It's like having 15 different cars to choose from, but none of them having all the wheels, the engine, transmission, chassis and all of that working properly at the same time.
Religion: The greatest weapon of mass destruction of all time
To much development on adding features, to little focus on stability.
We'd have more stability if they focused on that instead, but it would take ages to add all the btrfs/zfs-like features which are not in other file systems like RAID. Because things would need to be changed and then stabilized, maybe it would even need a new disk-format.
It might be better to have less stability for a while until most features are part of the code base and then working stabilizing.
New things are always on the horizon
I bet the power never goes out at CERN.
When all you have is a hammer, every problem starts to look like a thumb.
That's absurd. There are several ways of seeing usage of both the BTRFS pool and of individual disks. You can't check individual subvolumes because it's a worthless number when you have a COW filesystem; you'd end up with 4 subvolumes at 3GB each, but you're only using 5GB total for your pool. Since BTRFS "snapshots" have no distinction from subvolumes, it becomes really hard to break this down.
Other than your first (Bad) argument, you make no mention for why ZFS is better than BTRFS. I will agree, ZFS on Solaris is better than BTRFS on Linux .... right now, but I find the design of BTRFS to be superior to ZFS, and when it's feature complete, I will rather use BTRFS. However I'm not an OS zealot, and will use the best tool for the job, instead of making up reasons to stick with my cult ... I mean OS / FS.
if linux is a toy (your basic argument)
Wrong. I admin linux servers professionally. I am a database developer who makes extensive use of open source software in a production environment.
On the contrary, my "basic argument" was that calling for other people to unite and consolidate is pointless. If they wanted to unite and consolidate, they would have already done so. Naturally, the people working on open source software have already considered everything you said, and they have already said "no".
FYI, I have been a linux geek since '97, and people have been saying the exact same things as you, continually, for the entire 15 years. The reality is that consolidation will happen if and when it makes sense to the people doing the actual work -- not when it makes sense to you.
I'm still waiting for my FSF Skype clone.
Eh? Free SIP clients have been around for much longer than Skype. SIP hardware has been around for much longer than Skype hardware. I have one sitting on my desk right now. Just because Skype became popular, does that mean the entire rest of the industry should abandon their existing IP communications protocol?
If you are "trusting your data" to *ANY* file-system, you are likely to be disappointed.
I have run btrfs off and on for maybe 3 or 4 years because I don't *HAVE* to trust my data to it. I have good backups that run daily. If btrfs screws the pooch, I'm not really out that much.
Note though, my backup servers run ZFS. :-)
Honestly, it seems to me that btrfs has gotten worse over the last few years rather than better. 4 years or so ago when I first started using it, it actually worked pretty well and I was fairly happy with it, including taking automatic snapshots, but I never had a data loss. ISTR that I switched away from it because I upgraded to a new distro and had to reformat, for various reasons. Newer versions I've tried have been barely usable and I've had brtfs wedge itself a few times. Some of the issues were distro integration issues I think, like 12.04 seemed to *ALWAYS* run a full fsck on boot, and I think it took a snapshot when I tried to do an upgrade to 12.10, which somehow caused it to think that it had space available when it didn't and it ran out of disc space during the upgrade...
I really want btrfs to get production ready, but I'm half thinking that by the time it is HAMMER2 will be out and I'll be infatuated with it. Note that btrfs and HAMMER started around the same time, maybe HAMMER had a 6 month lead. HAMMER has been "production stable" and has been the default Dragonfly BSD filesystem for several years. Dillon seems to know how to build a file-system...
I'm guessing parent refers to the trick where you write to a different file then move the data in place when done. Which should be OK, for values of "sanely designed" equal or greater than "won't write metadata until data is safely on disk".
zfsonlinux has less testing than Btrfs? Really?
I think you mean *THE LINUX SHIM* has less testing. However, there's this *HUGE* portion of the code, as a wild ass guess I'd say 80%, which is the internal algorithms, data structures, and other internal parts of the file-system that are shared by the Linux and Solaris versions and those have been quite seriously tested for ZFS.
My experience with ZFS under Linux via FUSE was that there were some bugs in the integration layer, but they tended to be fairly shallow and never lead to data loss. This is over around 3 years of ZFS+FUSE on Linux serious use (~30TB of backup storage, home storage server). I tested the heck out of ZFS+FUSE before we deployed it, found some issues, worked with the developers (who were amazing!), and eventually got to a point where the stress test I was running on it was more stable than it was under our OpenSolaris systems a few years prior (and the reason I built the stress test).
Based on my experience with ZFS, ZFS+FUSE, and btrfs, I'd personally trust ZFSonLinux over btrfs. My experimentation with btrfs the last few years has been that it still needs a lot of work.
We have these called links. Reading. Try it sometime.
This is a PDF mirror of the scribd link
ZFS: THE LAST WORD IN FILE SYSTEMS by Bill Moore
http://www.cs.utexas.edu/users/dahlin/Classes/GradOS/papers/zfs_lc_preso.pdf
Hi, Is this the reason the system locks up when deleting large numbers of files (near the default max inode for a directory, debian squeeze)? Writing large amounts of data (larger than the files combined) is non-blocking.
Since you are too dam to lazy to google: brtfs vs zfs ...
http://www.seedsofgenius.net/uncategorized/zfs-vs-btrfs-a-reference
I tried to use a SIP client for Skype-style usage once. Maybe I just had a terrible one recommended to me, but I did not know it was possible to make a messenger app that confusing. Does anyone really expect average Skype users to convert to that?
All I know is that "Btrfs" makes me think of a drunk trying to say the word "barf" in a sentence. Maybe they should come up with a better name, like "Anchor" (lifted right off the story).
The only meaningful difference is that the Skype application comes pre-configured for their network. SIP softphones have to be manually configured by their users.
4TB SAS drives are available, shipping, and being used.
http://www.newegg.com/Product/Product.aspx?Item=N82E16822178306
"Please don't sigh like that, maam"
I've lost data with ext3.
... and backups.
I've had data corruptions with reiserfs.
I've lost data with ext4 (which happened to be the most frustrating, tedious, and complete failure of all).
Most recently, I had some HD failures on a fully RAID-1'd server running entirely on XFS, and had to re-install the OS from scratch and restore from backups. The new install was onto btrfs.
I've had partitions running on btrfs for a little over a year, and have not yet lost data on these, but it's just a matter of time; I will lose data. I used to blame it on cheap drives, but I've seen SMART failures on young Seagates so I'm now convinced there's no such thing as a high quality, high density drive. At the moment, I find btrfs easy to use (intuitive and simple), and full-featured, so it's what I'm currently using. But I suffer from no illusions; at some point, I will have FS corruptions and have to restore from backups, and I can only hope that any FS corruptions won't go undetected and be propagated to my backups for very long before that happens. Failures are inevitable no matter what I use, so now I value simplicity, convenience, speed
Right now, btrfs beats the alternatives for convenience and features. I put my trust in backups, not file systems, and value is in features and convenience, not some false perception of safety or reliability.
This post just emphasises again that we need engineering standards and measures for software.
Well, if linux is a toy (your basic argument)
How the hell can you derive that being his basic argument?
I know there are only a few diehard holdout BeOS geeks still out there, and I know we have a terrible secret the world has never uncovered: BeFS. This file system, coded and deployed (production) in 1992, is 64-bit, multi-threaded, and fully journaled â" attributes taken for granted today, but only futuristic buzzwords for other OSes of the day. Hard drives deployed on R4, an Intel x86 or PPC OS, were typically 6GB IDE drives. BeFS can handle single files of up to 18,000 petabytes - all of recorded human history at the time was only ~100 petabytes. BeFS is built on an OODB. It's tough, reliable, and well documented (there are even three venerated O'Reilly books on the subject â" two dedicated to *just* the filesystem). It's what zfs and btfs want to be when they grow up. And today, it's discarded. While Linux, OS X, BSD and other OSes could be compiled with kernel support, they aren't. Running it essentially means putting a virtual FS in a file. Tragic â" another example of reinventing the wheel.
"Adventure? Excitement? A Jedi craves not these things."
Last time I looked which is admittedly about 18 months ago, even the lastest e2fsprogs did not carry support for ext4 greater than 16TB. Then again would you really trust more than 16TB to a file system that has had support for such a short period of time? If you look into it there performance of ext4 with filesystems that size sucks anyway.
Lets face it ext4 exists for one reason and one reason only - Lustre.
Given a choice of production ready according to (my tests) or production ready according to (my tests Red Hat tests) I'd take the latter every time.
Yes, sooner or later Red Hat WILL miss something. Sooner or later I WILL miss something too. I trust Red Hat to do a breadth of testing which I don't do, and then I do a depth of testing for my specific workload as best as I can model it (and real life has this really annoying habit of finding inventive ways not to conform to my models.)
1. One size doesn't fit all though. Most filesystems aside from ZFS sacrifice correctness for the sake of performance. * For enterprise correctness is more important then performance. * For home use performance is more important then correctness.
There's another issue mixed in there: only 3 of those systems support clustering - and that's counting OCFS and OCFS2 as different filesystems. So you can add single machine performance vs. distributed performance into the mix. Then there's small file vs. large file performance: if you're only ever storing virtual machine disk images and you could get a 1% I/O boost by using an optimized FS then you'd probably take it. Suddenly the number of filesystems which need supporting starts to look reasonable.
It's hard to calculate free space on a per volume level because unlike ZFS, BTRFS allows for per object RAID levels. You can't really calculate space used without walking the entire tree, which is prohibitively expensive.
Say a user has a volume with a logical size of 1GB, then create a file of 1MB, but then sets the file to be replicated 8 times. Do you report 1023MB free or 1016MB free? Now assume tends of thousands of files with a different configuration. One may be RAID5, one may be RAID6, one may be RAID10.
It's a useful feature.
If you get a client with your network, it comes preconfigured.. so that's not a difference.
There *are* a lot of sucky sip clients. There are also some excellent ones.