Btrfs Is Getting There, But Not Quite Ready For Production

Read their website by Anonymous Coward · 2013-04-26 02:11 · Score: 5, Informative

It says "experimental." They appreciate you helping them test their file system out. I appreciate it too, so please do. But remember that you are testing an experimental filesystem. When it eats your data, make sure you report it and have backups.

Re:Read their website by pipatron · 2013-04-26 02:41 · Score: 5, Informative

Every file system is/should be labled "experimental" in a way. The long answer from the btrfs FAQ is pretty good, and makes some sense:

Long answer: Nobody is going to magically stick a label on the btrfs code and say "yes, this is now stable and bug-free". Different people have different concepts of stability: a home user who wants to keep their ripped CDs on it will have a different requirement for stability than a large financial institution running their trading system on it. If you are concerned about stability in commercial production use, you should test btrfs on a testbed system under production workloads to see if it will do what you want of it. In any case, you should join the mailing list (and hang out in IRC) and read through problem reports and follow them to their conclusion to give yourself a good idea of the types of issues that come up, and the degree to which they can be dealt with. Whatever you do, we recommend keeping good, tested, off-system (and off-site) backups.

--
c++; /* this makes c bigger but returns the old value */
Re:Read their website by Tarlus · 2013-04-26 02:44 · Score: 2

And make sure those backups aren't also on a btrfs volume.

--
/* No Comment */
Re:Read their website by Tarlus · 2013-04-26 03:02 · Score: 1

Maybe if you wrote the image of a btrfs volume to a tape?

--
/* No Comment */
Re:Read their website by isopropanol · 2013-04-26 03:20 · Score: 3, Insightful

Also, read the article. The authors were experimenting and came across some bugs in some pretty hairy edge cases (hundreds of simultaneous snapshots, large disk array suddenly becoming full, etc) that did not cause data loss. They eventually decided not to use BTRFS on one type of system but are using it on others.
To me, the article was a good thing... But I would have preferred if it was worded as here are some edge case bugs that need fixing before BTRFS is used in our scenario, rather than that these were show stoppers... Because these are not likely show stoppers to anyone who's not implementing the exact same scenario.
Also It sounds like they should jitter the start time of the backups...
Re:Read their website by e70838 · 2013-04-26 03:24 · Score: 1

The Tape Archiver is filesystem agnostic ;-)
Re:Read their website by Bengie · 2013-04-26 03:38 · Score: 4, Informative

My cousin said when he had to go "FS shopping" for his research data center, they had some requirements, most notably, being used by several enterprises that all store at least 1PB of data on the FS and have not had any critical issues in 5 years.

He said the only FS that fit-the-bill was ZFS. His team could not find an enterprise company that stored at least 1PB of data on ZFS and had a non-user caused critical problem within the past 5 years. That was many years ago and he has not had a single issue with his multi-PB storage that is being used by hundreds of departments.

ZFS is not perfect, but it sets a very high bar.
Re:Read their website by Tough+Love · 2013-04-26 03:41 · Score: 1, Insightful

Bugs are like roaches. If you see one, you can be sure there are many others hiding in the cracks. There is no room for any bugs at all in a filesystem to which you will trust your essential data.

--
When all you have is a hammer, every problem starts to look like a thumb.
Re:Read their website by AvitarX · 2013-04-26 03:52 · Score: 2

Being unfixable when full is a pretty big show stopper IMO.

--
Wow, sent an e-mail as suggested when clicking on "use classic" banner, and got a fast response that addressed my msg
Re:Read their website by Zero__Kelvin · 2013-04-26 04:08 · Score: 5, Informative

Did your cousin also find out what exact hardware and exact code was used? If my friend has had no problems with filesystem $FS and then I use it with different hardware and code implementing it, then there is still a significant chance that I will have trouble that he did not. Filesystems all work perfectly, because they are conceptual. It is the implementation that may or may not be stable.

--
Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
Re:Read their website by Zero__Kelvin · 2013-04-26 04:10 · Score: 1

Somebody really should come up with a new technology for backing things up! It is ridiculous that the only way to back things up is to tape, I say.

--
Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
Re:Read their website by UnknownSoldier · 2013-04-26 04:30 · Score: 2

> There is no room for any bugs at all in a filesystem to which you will trust your essential data.
Your ideology is admired except it is not practical :-(
* So you are able to guarantee you are able to write 100% bug free code?
* AND it can deal with hardware failures such as bad memory?
I have a bridge to sell you :-)
Re:Read their website by Bigby · 2013-04-26 04:36 · Score: 1

btrfs solves that if you use a mirrored volumes and snapshots. But again, if you trust btrfs...
Re:Read their website by jones_supa · 2013-04-26 04:43 · Score: 1

Why the heck would anyone do that?
Re:Read their website by Harik · 2013-04-26 04:46 · Score: 4, Insightful

It's an issue with any CoW filesystem being full - in order to delete a file, you need to make a new copy of the metadata that has the file removed, then a copy of the entire tree leading up to that node then finally copy the root - and once the root is committed, you can free up the no-longer in-use blocks. At least, as long as they're not still referenced by another snapshot.
The alternative is to rewrite the metadata in place and just cross your fingers and hope you don't suffer a power loss at the wrong time, in which case you end up with massive data corruption.
I've filled up large (for home use) BTRFS filesystems before - 6-10tb. The code does a fairly good job about refusing to create new files that would fill the last remaining bit so it leaves room for metadata CoW to delete. The problem may come from having a particularly large tree that requires more nodes to be allocated on a change then were reserved - in which case the reservation can be tuned.
BTRFS isn't considered 'done' by any means. It was only in the 3.9 kernel that the new raid5/6 code landed, and other major features (such as dedup) are still pending. It's actually very encouraging that a work-in-progress filesystem is as solid as it is already.
Re:Read their website by Bigby · 2013-04-26 04:47 · Score: 2

Does btrfs support the removal of a stripped volume yet? I want to issue a "remove" command, let it re-balance, and then remove that drive. I know the other disk can take on the space. Then I want to add another larger volume, which I know it supports.
Re:Read their website by Anonymous Coward · 2013-04-26 04:47 · Score: 2, Informative

Mirrors and snapshots are not backups. They can be used to create backups, but are not backups in themselves.
Re:Read their website by g1zmo · 2013-04-26 04:49 · Score: 2

Netgear's consumer-level NAS products are now using btrfs. This being the Internet and all, folks are complaining in forums and Facebook about...well if not about this then I guess it would be something else.

--
I have found there are just two ways to go.
It all comes down to livin' fast or dyin' slow. -REK, Jr.
Re:Read their website by TheDarkMaster · 2013-04-26 05:03 · Score: 1

It may not be practical or realistic, but nothing should stop you from at least trying to make your system bug-free

--
Religion: The greatest weapon of mass destruction of all time
Re:Read their website by Tarlus · 2013-04-26 05:45 · Score: 1

I have no idea.

--
/* No Comment */
Re:Read their website by wagnerrp · 2013-04-26 06:02 · Score: 3

Mirrors are not backups. You are correct about that. They are merely redundancy. Snapshots ARE backups. You can do whatever you want to the original copy, the the snapshot will remain undisturbed. Snapshots are simply not physical backups, however they can be if you export them to a backup server.
Re:Read their website by lgw · 2013-04-26 06:05 · Score: 1

Mirrors and snapshots are not backups. They can be used to create backups, but are not backups in themselves.
Actually, the combination is a backup. Mirroring/RAID isn't backup, because user/admin mistakes will be mirrored. But a read-only copy of your data on another drive is a backup by any reasonable definition. Even if both drives are in the same case - something like 85% of restores are because someone said "oops" vs 15% from hardware failure.
I do wish async mirroring across a WAN was easier to come by in the consumer space, however.

--
Socialism: a lie told by totalitarians and believed by fools.
Re:Read their website by KiloByte · 2013-04-26 06:44 · Score: 2

When it comes to data safety, btrfs has been production ready for a few years already. There are issues with latency -- largely fixed -- and dealing with asinine abuse of fsync(). That's also mostly dealt with, although there's no real full fix other than fixing problematic software in the first place. There's no real way to have efficient cow/etc and fast fsync together, but you don't need the latter if the filesystem can do transactions for you.
So we have a filesystem with a number of safety features but relatively new code vs one with code/design that's 40 years old but has hardly any safety features at all. I'd say, it's ext4 that's not production ready: a no-op backup can take half an hour (a big spinning disk that holds a bunch of vservers).

--
The creatures outside looked from Alt-Right to Antifa; but already it was impossible to say which was which.
Re:Read their website by hawkinspeter · 2013-04-26 06:57 · Score: 1

Possibly if it's encrypted or you're backing up a virtual machine (backup a snapshot of it's disk device).

--
You're a temporary arrangement of matter sliding towards oblivion in a cold, uncaring universe
Re:Read their website by Tough+Love · 2013-04-26 07:06 · Score: 2

I won't buy your bridge or move my systems away from Ext4 for the time being. BTW, E2fsck does a great job of repairing filesystems that have been corrupted (sometimes massively) by hardware failure of various kinds. This is an essential trick that ZFS and Btrfs have yet to learn.

--
When all you have is a hammer, every problem starts to look like a thumb.
Re:Read their website by AvitarX · 2013-04-26 07:59 · Score: 1

Perhaps this is absurd (due to wasted space), but couldnt space be reserved to match the size of ll of the metadata, thus assuring this case is never hit? I would think that with some workloads this wouldn't waste too much space.

--
Wow, sent an e-mail as suggested when clicking on "use classic" banner, and got a fast response that addressed my msg
Re:Read their website by Anonymous Coward · 2013-04-26 08:07 · Score: 1

Elliptics has a few successful 1 Pb deployments... It is truly clustered, unlike ZFS.
Re:Read their website by Anonymous Coward · 2013-04-26 10:05 · Score: 2, Informative

I'm sorry, but I call BS on this. I love ZFS and it is a great ,solid file system. But your friend couldn't have gone looking for case studies of five-year-plus ZFS usage "many years ago". ZFS has only been around for about eight years, the first few of those years it saw very limited usage (ie OpenSolaris). Yes, ZFS is a great file system, but let's stick to factual reasons why it is good, no need to make up stories.
Re:Read their website by TangoMargarine · 2013-04-26 11:25 · Score: 1

ext4 was released (added to the Linux kernel) earlier in the same year as btrfs came out! This seems rather at odds with your "100% trustworthy" viewpoint.

--
Unity? Screw that: XFCE. Slashdot Beta? Screw that: SoylentNews. Australis? Screw that: Pale Moon. UX developers DIAF
Re:Read their website by cas2000 · 2013-04-26 12:58 · Score: 1

I do wish async mirroring across a WAN was easier to come by in the consumer space, however.

it is. here's a very simple example:
snapname=$(date +%Y%m%d) zfs snapshot "filesystem@$snapname" zfs send "filesystem@$snapname" | ssh remote-system zfs receive filesystem
'zfs send' also has options for sending only incremental differences between snapshots.
The example $snapname variable definition only supports one snapshot per day. if you need more, include hours, minutes, and/or seconds. you can use any arbitrary string as the name of a snapshot...date/time is merely convenient, not required.
Re:Read their website by Tough+Love · 2013-04-26 16:45 · Score: 1

Ext4 has a bigger, more experienced team behind it and was derived from a mature design (Ext3)

--
When all you have is a hammer, every problem starts to look like a thumb.
Re:Read their website by Tacticus.v1 · 2013-04-26 21:37 · Score: 1

Yes it does.
Standard remove will remove and rebalance i was testing that in 3.8 last night.
Re:Read their website by smash · 2013-04-26 22:40 · Score: 1

Can I burn it with fire? No? Same storage? Well then it's NOT a backup.

--
I run: Windows, OS X, Linux, FreeBSD. Just because you have a hammer, doesn't mean everything is a nail.
Re:Read their website by ssam · 2013-04-27 01:45 · Score: 1

yes, but you have no idea if the data in the files has become corrupted by the same hardware error. to be robust a filesystem needs data checksums.
there have been reports on the btrfs mailing list of bugs, that turn out to be due to hardware issues. the drives were corrupting the data and btrfs notices pretty quickly. the drives were also corrupting data when they had other filesystems on them, but it had not been noticed.
Re:Read their website by wagnerrp · 2013-04-27 02:03 · Score: 1

So you're saying if I'm editing some file, but don't want to lose the previous version, I can "make a backup" by creating a copy of that file, but that's not technically a backup? The duplicate copy is protected from user error, as well as machine error on the part of the editing program, so how is that not a backup? It's not protected against theft or physical damage, but it is protected from editing. What if I make a backup disk, put it in a vault in a bank down the street, and an asteroid hits and wipes out the whole county? One incident took out both copies. Does that mean the two duplicates too close together to be considered a backup?
Re:Read their website by Anonymous Coward · 2013-04-27 06:15 · Score: 1

If you read the zfs-discuss list, I don't think ZFS meets the bar of "1PByte and no critical issues in 5 years."
* It has no fsck tool and is full of lazy assertions that ask you to restore your entire pool from backup, and sites using iSCSI between pool and storage often hit these assertions.
* It often encounters failure modes where it grinds to a crawl: your data is still there, but some resource has crossed a threshold and caused 10000x slowdown, somewhere inside the kernel. A step in bootup or a single command will hang with no feedback to the user other than disk activity. Users report successful outcomes after waiting 1 day. Other users wait 1 day and then give up and destroy the pool. If your data is still there, but will take 1 year to mount, is it really there? No. You need to restore from backup again. Some of these cases are precarious because you can accumulate the debt without seeing a performance drop while it's accumulating, so there's no way to monitor for it. Then, you reboot, or scrub, or delete something, and the debt manifests.
- dedup table exceeds size of RAM or L2ARC
- filesystem fills up (btrfs also)
- lots of filesystems or snapshots (>1000) makes boot take >1hr
- 'zpool scrub' performance depends on fragmentation and number of files, and while 'zpool scrub' is supposed to be "online", is it really adequately online if (1) you cannot complete a scrub before your reliability schedule says the next scrub is due to start, (2) scrub decimates the performance of the database running above your pool? For some configurations and storage patterns, scrub is basically an offline activity.
- 'zfs delete' performance is equivalent to 'rm -rf', and there's no other way to remove a ZFS. so, you think it's like LVM, but it's not. The only thing that can be deleted with the same performance bound as an LV is an entire pool.
* It certainly cannot scale to 1PByte. Just because the data structures support that doesn't mean it scales. See the performance issue above.
- Also there's something more fundamental: all ZFS reads and writes must go through a single kernel. Ceph, Lustre, Google's GFS, samfs, don't have this limitation. "Scales to 1PByte" implies requirement "can run a mapreduce over the entire 1PByte in some small multiple of the time one can run the same algorithm over a full single disk" which those other filesystems meet (at least, they meet it if you store big files) and ZFS does not.
- ZFS hangs when a device dies, and Sun stubbornly refused to put performance measuring code above the device layer because of software-engineer pedantry about being "well-factored" or where it "belongs", never mind what actually fucking works. This decision makes the overall system's availability dependent on the number of disks you're using, regardless of how those disks are arranged into redundancy groups: the filesystem will freeze for [3 minutes] - [until reboot] depending on the disk driver whenever a disk goes bad, even if redundancy ensures the data's still there---you can never get to 1PByte with this limitation.
you need to do more than "read the website" to know this stuff about ZFS, unfortunately. I think they should have been more honest when promoting it. Then again, I've heard GlusterFS's promotion isn't too honest, either.
Re: Read their website by Anonymous Coward · 2013-04-27 12:18 · Score: 1

Compared to anything available for Linux, ZFS is a close approximation of perfect. Everything can be done online, including bad disk replacement, expansion, and extension. On one set of data ZFS compression gets us 24x.
I tested btrfs on the same data. It wouldn't automagically mount at boot-time, and after an hour of digging I found that I had to not only add it to fstab, but i actually had to specify every component device in the entry there for it to be recognized as a valid filesystem. Then on the same data I got .5x compression. Yes, it *doubled* the disk used.
For btrfs to be regarded as non-vaporware it needs to
1) get serious about missing features
2) get existing features to work
3) get more than one (part-time) dev, who clearly isn't serious about the project
4) be trustworthy for root file systems without any BIOS bullshit
5) Have an up-to-date web site without promises from two years ago for stuff RSN that still isn't ready.
None of these show any signs of happening soon. I really don't understand how "enterprises" can be all about linux when the storage software is so pitiful. Ext4 only just recently got the ability to be >16 TB. There's no comprehensive and functional disk management tool, i can't even use kickstart to make a non-MSDOS partition table. WTF? How are businesses making do? Do they all have in-house kernel devs to write custom stuff?
Until such a time that btrfs is viable, or the ZFS and Linux license people grow up and allow a usable ZoL, I'm stuck with Solaris 10 for ZFS or abandonware for-pay XFS on unpredictable HBA RAID volumes. We would pay real money for a commercial implementation if only one were available.
Re:Read their website by crutchy · 2013-04-28 19:27 · Score: 1

your cousin was obviously holding it wrong
Re: Read their website by crutchy · 2013-04-28 19:29 · Score: 1

maybe when he does a google search he thinks he must download all pages in the results just in case they are all relevant
either that or he has the world's biggest porn collection
Re:Read their website by crutchy · 2013-04-28 19:31 · Score: 1

that's the most honest response i think i've ever read on /.
mod -1 insufficient bullshit
Re: Read their website by Aaden42 · 2013-04-29 03:41 · Score: 1

ZoL is useable. Not kernel-integrated, but as an installable module, the licensing isn't an issue. Been using it on home systems for over a year after switching from BSD (moved the BSD pools into ZoL as-is). No dataloss or corruption issues, though I've hit some pathologically bad performance limitations at times on admittedly under-sized hardware.
Also, have hit the "wait for days and hope" issue with ZFS on BSD about two years ago. No dataloss, but three very tense days waiting for dedupe-enabled pool to import on a system that had way too little RAM to property support dedupe.

Happy with XFS by zidium · 2013-04-26 02:13 · Score: 3, Informative

I've been happily using the XFS file system since the early-to-mid-2000s and have never had a problem. It is rock solid and much faster than ext3/ext4 in my experience, tested a lot longer than Btrfs, and handles the millions and millions of small files on redditmirror.cc very effectively.

--
Slashdot Valentines Beta Massacre: iT WORKED! The boycotts killed Beta!!

Re:Happy with XFS by h4rr4r · 2013-04-26 02:18 · Score: 3, Insightful

It also has none of the features that make Btrfs exciting and modern.
XFS is fine, so is Ext3/Ext4, but Linux need a modern file system.
Re:Happy with XFS by bored · 2013-04-26 02:22 · Score: 3, Informative

Your happy with XFS because your machine has never lost power or crashed. If either of those things happened with the older versions of XFS it was nearly a 100% guarantee you would lose data. Now i'm told its more reliable.
So, if you told me you have been running it for the last year and it was reliable I would have given you more credit than claiming you have been running it for a decade and its been reliable. Because, its had some pretty serious issues that if you didn't hit them means your not a good test case.
I'm still skeptical, because AKAIK, XFS still doesn't have an order data mode.
Re:Happy with XFS by nametaken · 2013-04-26 02:30 · Score: 1, Redundant

This is why we can't have nice things.
Re:Happy with XFS by h4rr4r · 2013-04-26 02:32 · Score: 2

No, I am suggesting datacenter linux needs something like ZFS. Proper snapshotting, block level dedupe, and all that jazz.
Btrfs is not yet ready, but in the next decade it will take on this role.
Re:Happy with XFS by iggymanz · 2013-04-26 02:33 · Score: 1

XFS on linux doesn't have the "modern" features (which mature OS have had for decades), such as shared filesystem clustering
Re:Happy with XFS by Hatta · 2013-04-26 02:35 · Score: 1

XFS doesn't checksum, support copy-on-write, etc.

--
Give me Classic Slashdot or give me death!
Re:Happy with XFS by MBGMorden · 2013-04-26 02:36 · Score: 5, Informative

Your happy with XFS because your machine has never lost power or crashed. If either of those things happened with the older versions of XFS it was nearly a 100% guarantee you would lose data. Now i'm told its more reliable.
I don't know about being more reliable. I use XFS on my RAID array (mdadm) at home. I'm running the latest version of Linux Mint (Nadia), and if I ever lose poser and don't unmount that file system cleanly it looses all recent changes to the drive (and "recent" sometimes stretches to hours ago). The drive mounts fine and nothing appears corrupted (so I guess its not completely data loss), but any files changes (edits, additions, or deletions) to the file system are simply gone.
Its gotten to the point where if I've just put a lot of stuff on the drive I unmount it and then remount it just to make sure everything gets flushed to disk. If I ever get a chance to rebuild that array it most certainly will be using something different.

--
"People who think they know everything are very annoying to those of us who do."-Mark Twain
Re:Happy with XFS by Booker · 2013-04-26 02:40 · Score: 4, Informative

No, that's FUD and/or misunderstanding on your part.
"data=ordered" is ext3/4's name for "don't expose stale data on a crash," something which XFS has never done, with or without a mount option. ext3/4 also have "data=writeback" which means "DO expose stale data on a crash." XFS does not need feature parity for ill-advised options.
Any filesystem will lose buffered and unsynced file data on a crash (http://lwn.net/Articles/457667/). XFS has made filesystem integrity and data persistence job one since before ext3 existed. Like any filesystem, it has had bugs, but implying that it was unsafe for use until recently is incorrect.
I say this as someone who's been working on ext3, ext4 and xfs code for over a decade, combined.
Re:Happy with XFS by jabuzz · 2013-04-26 03:06 · Score: 2

On the other hand the code was first released as production nearly 20 years ago. Of all the current Linux file systems XFS has the best performance, the best scalability and the best stability.
Want to put 100TB of data on btrfs be my guest.
Re:Happy with XFS by bored · 2013-04-26 03:10 · Score: 5, Insightful

No, that's FUD and/or misunderstanding on your part.
"data=ordered" is ext3/4's name for "don't expose stale data on a crash," something which XFS has never done,
Actually, I think your the one that doesn't understand how a journaling file system works. The problem with XFS has been that it only journals meta data, and the data portions associated with the metadata are not synchronized with the metadata updates (delayed allocation an all that). This means the metadata portions (filename, sizes, etc) will be correct based on the last journal update flushed to media, but the data referenced by that meta-data may not be.
A filesystem that is either ordering its meta data/data updates against a disk with proper barriers, or journing the data alongside the meta data doesn't have this problem. The filesystem _AND_ its data remain in a consistent state.
So, until your understand this basic idea, don't go claiming you know _ANYTHING_ about filesystems.
Re:Happy with XFS by Urban+Garlic · 2013-04-26 03:13 · Score: 1

I've been using it for a long time, too, it's a perfectly respectable choice, and if I had to use it for ten more years, that would be OK.
However, particularly for back-up systems, I am ready for snapshots and block-level deduplication. I tried to deploy something like this with XFS over LVM a few years ago, but discovered that the write performance of LVM snapshots degrades rapidly when there are a lot of them, and it helps a lot if you can guess the size in advance, which is hard. There's also a hard limit of 255 snapshots, but in our environment, performance became unacceptable before we got anywhere near that.
You're right that XFS "ain't broke", but I for one am ready for more features.

--
2*3*3*3*3*11*251
Re:Happy with XFS by Anonymous Coward · 2013-04-26 03:20 · Score: 2, Informative

there's CXFS which _is_ a clustered filesystem. Not as popular as GFS or OCFS2, but it's there, and uses the same block format as 'regular' XFS.
Not sure what you refer by "mature OS", but note that ZFS is _not_ a cluster filesystem by any strecth of the definition.
Re:Happy with XFS by Anonymous Coward · 2013-04-26 03:29 · Score: 1

Or perhaps you are confusing volume/RAID management with the file system?
That would be like mixing chocolate and peanut butter. Seriously, there is no reason the "volume/RAID" abstraction and "file system" abstraction should not be merged. Separating the two was a solution to a problem that no longer exists.
Re:Happy with XFS by jedidiah · 2013-04-26 03:36 · Score: 2

That would be the same "mature" operating systems that have generally needed to employ products from 3rd party vendors in order to have interesting filesystems.

--
A Pirate and a Puritan look the same on a balance sheet.
Re:Happy with XFS by Kz · 2013-04-26 03:37 · Score: 4, Interesting

Your happy with XFS because your machine has never lost power or crashed. If either of those things happened with the older versions of XFS it was nearly a 100% guarantee you would lose data. Now i'm told its more reliable.
It _is_ quite reliable, even on the face of hardware failure.
Several years ago, I hit the 8TB limit of ext3 and had to migrate to a bigger filesystem. ext4 wasn't ready back then (and still today it's not easy to use on big volumes). Already had bad experiences with reiserfs (which was standard on SuSE), and the "you'll lose data"warnings on XFS docs made me nervous. It was obviously designed to work on very high-end hardware, which I couldn't afford.
so, I did extensive torture testing. hundreds of pull-the-plug situations, on the host, storage box and SAN switch, with tens of processes writing thousands of files on million-files directories. it was a bloodbath.
when the dust settled, ext3 was the best by far, managing to never lose more than 10 small files in the worst case, over 70% of the cases recovered cleanly. XFS was slightly worse, never more than 16 lost files and roughly 50% clean recoveries. ReiserFS was really bad, always losing more than 50-70 files and sometimes killing the volume. JFS didn't lose the volume, but lost files count never went below 130, sometimes several hundred.
needless to say, i switched to XFS, and haven't lost a single byte yet. and yes, there has been a few hardware failures that triggered scary rebuilding tasks, but completed cleanly.

--
-Kz-
Re:Happy with XFS by Bengie · 2013-04-26 03:41 · Score: 2

It is impossible to compete with a FS+VolumeManager+RAID hybrid. There is just some stuff that impossible to do without coupling those layers and those impossible things are becoming requirements.
Re:Happy with XFS by Tough+Love · 2013-04-26 03:44 · Score: 2

there is no reason the "volume/RAID" abstraction and "file system" abstraction should not be merged. Separating the two was a solution to a problem that no longer exists
Oh, so true. Indeed, problems like modularity, maintainability and shared functionality stopped existing long ago as we all know.

--
When all you have is a hammer, every problem starts to look like a thumb.
Re:Happy with XFS by Blackknight · 2013-04-26 04:19 · Score: 1, Redundant

XFS is not a clustered file system. For something like that you want Lustre, GPFS, GFS, etc.
Re:Happy with XFS by Anonymous Coward · 2013-04-26 04:29 · Score: 1

Seriously, there is no reason the "volume/RAID" abstraction and "file system" abstraction should not be merged.
Agreed. Hell, what's with all this "Virtual File System" abstraction shit, too? Why can't I just open() a directory inode and then read() the contents instead of mucking around with this readdir() nonsense? ABSTRACTIONS ARE BAD DAMNIT!
Re:Happy with XFS by gmack · 2013-04-26 04:29 · Score: 2

XFS is mostly reliable but, as I found out with several PCs, if it gets shut off at the wrong time it will need a disk repair and then you are in for some fun because their repair utility doesn't work at all on a mounted FS (even if it is read only) meaning to repair a damaged XFS volume you will now need to use a boot disk.
Re:Happy with XFS by loufoque · 2013-04-26 04:45 · Score: 3, Informative

Ever heard of the sync command?
Re:Happy with XFS by Harik · 2013-04-26 04:49 · Score: 2

Oh, so true. Indeed, problems like modularity, maintainability and shared functionality stopped existing long ago as we all know.
It's almost like people have discovered that you can have modularity and shared functionality in a different way than artifically seperating storage layers and throwing away important data at each layer boundry.
Re:Happy with XFS by Booker · 2013-04-26 05:42 · Score: 4, Informative

So, until your understand this basic idea, don't go claiming you know _ANYTHING_ about filesystems.
Without sounding like too much of a jerk, I have hundreds of commits in the linux-2.6 fs/* tree. This is what I do for a living.
I actually do have a pretty decent grasp of how Linux journaling filesystems behave. :)
Test your assumptions on ext4 with default mount options. Create a new file and write some buffered data to it, wait 5-10 seconds, punch the power button, and see what you get. (You'll get a 0 length file) Or write a pattern to a file, sync it, overwrite with a new pattern, and punch power. (You'll get the old pattern). Or write data to a file, sync it, extend it, and punch power. (You'll get the pre-extension size). Wait until the kernel pushes data out of the page cache to disk, *then* punch power, and you'll get everything you wrote, obviously.
XFS and ext4 behave identically in all these scenarios. Maybe you can show me a testcase where XFS misbehaves in your opinion? (bonus points for demonstrating where XFS actually fails any posix guarantee).
Yes, ext3/4 have data=journaled - but its not default, and with ext4, that option disables delalloc and O_DIRECT capabilities. 99% of the world doesn't run that way; it's slower for almost all workloads and TBH, is only lightly tested.
Yes, ext3's data=ordered pushes out tons of file data on every journal commit. That has serious performance implications, but it does shorten the window for buffered data loss to the journal commit time.
You want data persistence with a posix filesystem? Use the proper data integrity syscalls, that's all there is to it.
Re:Happy with XFS by lgw · 2013-04-26 06:13 · Score: 2

Dammit, I agree with h4rr4r - what is this world coming to?
One problem with snapshotting is that it's pretty useless without a standard way to quiesce apps. It was a huge deal on the Microsoft side when shadow copy happened. You really want to be able to take a snapshot of a DB store or mail store and be sure you're getting something coherent, which will require the cooperation of the software involved.
Multiple volume snaps also require a similar framework if you want coherent sets of snaps. (Almost all complex software makes the assumptions about write order for performance reasons and to avoid locking - but those assumptions are easily violated by snapping multiple volumes.

--
Socialism: a lie told by totalitarians and believed by fools.
Re:Happy with XFS by Tough+Love · 2013-04-26 07:02 · Score: 1

What modularity?

--
When all you have is a hammer, every problem starts to look like a thumb.
Re:Happy with XFS by Sloppy · 2013-04-26 07:19 · Score: 1

Of all the current Linux file systems XFS has the best performance, the best scalability and the best stability.
My JFS anecdotes are way happier than my XFS anecdotes. I realize that's not worth much, but a person can't just ignore their experiences.

--
As copyright owner of this comment, I authorize everyone to defeat any technological measure which limits access to it.
Re:Happy with XFS by bored · 2013-04-26 07:20 · Score: 3, Interesting

Without sounding like too much of a jerk, I have hundreds of commits in the linux-2.6 fs/* tree. This is what I do for a living.
Well, then your part of the problem. Your idea that you have to be correct or fast is sadly sort of wrong. Its possible to be correct without completely destroying performance. I have a few commits in the kernel as well mostly to fix completely broken behavior (my day job in the past was working on an enterprise unix). So, I do understand filesystems too. Lately, my job has been to replace all that garbage, from the scsi midlayer up, so that a small industry specific "application" can both make guarantees about the data being written to disk while still maintaining many GB/sec of IO. The result, actually makes the whole stack look really bad.
So, I'm sure your aware that on linux, if you use proper posix semantics (fsync() and friends) the performance is abysmal compared to the alternatives. This is mostly because of the "broken" fencing behavior (which has recently gotten better but still is far from perfect) in the block layer. Our changes depend on 8-10 year old features available in SCSI to make the guarantees that aren't available everywhere. But it penalizes devices which don't support modern tagging, ordering and fencing semantics rather than ones that do.
Generally in linux, application developers are stuck either dealing with orders of magnitude performance loss, or they have to play games in an attempt to second guess the filesystem. Neither is a good compromise and its sort of shameful.
Maybe its time to admit linux needs a filesystem that doesn't force people to choose either abysmal performance, or no guarantees about integrity.
Re:Happy with XFS by MBGMorden · 2013-04-26 07:42 · Score: 1

No, actually I hadn't. Though I have now looked it up and it does look like it'll simplify the process. Thanks.

--
"People who think they know everything are very annoying to those of us who do."-Mark Twain
Re:Happy with XFS by Dogers · 2013-04-26 08:26 · Score: 1

So root FS on ext3/4 and mount your large data volumes elsewhere as XFS. Job done?

--
I am a viral sig. Please copy me and help me spread. Thank you.
Re:Happy with XFS by Booker · 2013-04-26 09:12 · Score: 1

Metadata checksums are under active development on XFS as we speak.
Re:Happy with XFS by operagost · 2013-04-26 09:38 · Score: 2

I've been using ReiserFS since 2006. It's killer.

--

Gamingmuseum.com: Give your 3D accelerator a rest.
Re:Happy with XFS by phoenix_rizzen · 2013-04-26 11:25 · Score: 1

Why haven't you moved to ZFS, then? ;) Has all the features you want for a backups server (which is where we currently use it):
- near instantaneous snapshots
- near unlimited snapshots
- no need to worry about sizes
- data and metadata checksums
- multiple levels of parity (single, double, triple)
- n-way mirroring
- compression (lzjb, gzip, lz4)
- online dedupe (although you need gobs of RAM)
- and more!
If you don't like Oracle, run it on Nexenta/Illumos-based distros. If you don't like Solaris, then run it on FreeBSD. If you really don't like OSes that work, you can even run it on Linux. :)
We're using it for backups (just under 200 remote systems rsync'd to a ZFS server; zfs send/recv to an off-site replica) with compression and dedupe (just over 5:1 disk savings). Separate filesystems for each remote site (toying with idea of separate filesystems for each server), snapshots made every morning, going back almost 2 years already.
Using 3x 24-drive SuperMicro chassis for the "local" backups, and a 45-drive JBOD plugged into a head unit for the off-site replica.
Re:Happy with XFS by TangoMargarine · 2013-04-26 11:28 · Score: 1

-1 Not Sure If Sarcasm
I like Reese's but you appear to be claiming this is a bad thing...

--
Unity? Screw that: XFCE. Slashdot Beta? Screw that: SoylentNews. Australis? Screw that: Pale Moon. UX developers DIAF
Re:Happy with XFS by cas2000 · 2013-04-26 13:37 · Score: 1

i've also been a very happy user of XFS for a very long time - it was (still is, mostly) my default file-system on linux machines.
but there really are compelling reasons why btrfs (and zfs) are steps in the right direction. the most important is error detection and correction on the data - older filesystems (like ext* and XFS) just don't and can't do this...and no, raid verification doesn't do it either. raid-arrays and even individual disks are getting so big that it's a statistical certainty that there WILL be errors in the data stored on disk. using a filesystem that can't detect those errors means you have silent corruption of your data (i.e. that you don't know about), and errors that can't be detected are also errors that can't be corrected.
the copy-on-write (COW) nature of both btrfs and ZFS is another good reason - there's *never* a time when old data is just over-written, it is copied and updated at the same time. this means that when data is written, a crash/power-failure results in either the old data OR the new data, but never a mix between the two.
another reason is that both btrfs and zfs combine the features of mdadm software raid and LVM volume management and a filesystem (but not all of the limitations/annoyances - e.g. a zfs filesystem quota or btrfs subvolume sizes are soft quotas, not volume/partition limits as they are with LVM)....and the sysadmin tools to create and manage the combination are *far* simpler to understand and use than mdadm or lvm tools. this removes a significant barrier to entry for good practices like raid and volume management.
being able to use an SSD or partition of an SSD for ZFS L2ARC (caching) and for the ZIL ("ZFS Intent Log" - a write cache for synchronous writes) is also very nice.
there are more reasons, but if you're interested, I suggest you start by reading the wikipedia pages on ZFS and btrfs.
http://en.wikipedia.org/wiki/ZFS
http://en.wikipedia.org/wiki/Btrfs
anyway, I still use XFS - on some systems as the only filesystem. On other systems, as the root fs only but with my bulk data storage on ZFS (i haven't bothered setting up root on zfs because i haven't needed it so far and converting would require significant downtime/disruption - but i expect that future systems I build will likely be entirely ZFS).
I initially tried btrfs a few years ago but found it buggy and unreliable, and i lost my btrfs filesystems to corruption one too many times (fortunately, only a /backup mount for rsync backups so i didn't lose any irreplacable data). By all accounts the btrfs bugs (and many more) that I encountered at the time have been fixed, but my solution was to switch to zfsonlinux instead. i'm glad i did - i was skeptical before i tried it but it turns out that it's pretty much everything i ever wanted from raid and volume management and filesystems all rolled into one.
So, yeah, Linux does need a modern filesystem. Fortunately, we have two to choose from: one included with the mainline kernel (btrfs) and one (ZFS) easily installed with dkms modules and packages for most distros.
Re:Happy with XFS by Kz · 2013-04-26 15:32 · Score: 1

So root FS on ext3/4 and mount your large data volumes elsewhere as XFS. Job done?
yes

--
-Kz-
Re:Happy with XFS by iggymanz · 2013-04-27 03:33 · Score: 1

I'm not referring to anything even remotely that recently invented
Re:Happy with XFS by jabuzz · 2013-04-27 10:31 · Score: 1

The problem with JFS is that is dead end code. IBM frankly are pushing GPFS for anything other than a boot disk. While it might be open source there is very little development going on with JFS. Compare that to XFS.
Re:Happy with XFS by akanouras · 2013-04-27 13:40 · Score: 1

Sigh, still haven't managed to kick this habit...
Re:Happy with XFS by UltraZelda64 · 2013-04-28 19:57 · Score: 1

Could a separate /boot not solve this as well? After all, at least in the past, XFS did have some trouble working as a the / partition for certain boot loaders if I remember right...
Re:Happy with XFS by Tony+Hoyle · 2013-05-04 07:14 · Score: 1

zfs doesn't support volume reshaping, which is fine for large datacentres with huge budgets but no use for smaller setups.. even at work if I were to say 'to add an extra Tb to the array we're going to have to spend $5000 to buy a duplicate one and recreate it' I'd be laughed at. So it's a nonstarter.
btrfs does, but it's not ready yet.. so like Urban Garlic, I'm still waiting, since things like dedup will make a huge difference.
Re:Happy with XFS by zidium · 2013-05-06 14:58 · Score: 1

It's not "your part of the problem", it's "You're part of the problem."
I refuse to listen to someone who's grammar is so messed up.

--
Slashdot Valentines Beta Massacre: iT WORKED! The boycotts killed Beta!!
Re:Happy with XFS by zidium · 2013-05-06 15:02 · Score: 1

I had to look this up...
KDE, lost its entire Git repository and mirrors becuase of a long-term, highly corrupted EXT4 file system that did not show any warnings until the server was rebooted.
http://www.permamarks.net/grabbed_urls/OQhBYg/www.phoronix.com_331.htmlz

--
Slashdot Valentines Beta Massacre: iT WORKED! The boycotts killed Beta!!
Re:Happy with XFS by gullevek · 2013-05-06 17:05 · Score: 1

I have used XFS on several systems for about 9 to 10 years. I had some issues with it on one box, I had to run xfs_repair on it from time to time, but that was in the range of once every 3 years. The other systems are fine. But those are server boxes with battery backup hardware raid and UPS that will shutdown the box gracefully in case something goes bad with the power. XFS does not like "home" systems and if you just reboot your box you are likely to have data loss.
Currently I am deploying ext4 on my new systems, somehow XFS doesn't have that much speed advantage over ext4 as to have it in production anymore.

--
"Freiheit ist immer auch die Freiheit des Andersdenkenden" - Rosa Luxemburg, 1871 - 1919

The oracle in the woodpile by larry+bagina · 2013-04-26 02:16 · Score: 2

I think we need to talk about the oracle in the woodpile - ie, Oracle. BTRFS is an Oracle project. What happens when it goes the way of MySQL? Will Monty Wideanus appear on a white steed to save us?

--
Do you even lift?

These aren't the 'roids you're looking for.

Re:The oracle in the woodpile by h4rr4r · 2013-04-26 02:27 · Score: 1

Because ZFS can never be distributed with Linux. It has to be bolted on after the fact, because SUN made a short sighted decision to try to keep Solaris alive.
Re:The oracle in the woodpile by larry+bagina · 2013-04-26 02:38 · Score: 5, Interesting

Oracle now owns ZFS. They could relicense it if they wanted to. BTRFS was started before the Sun acquisition but it seems strange* to develop BTRFS as a GPL file system with ZFS-like features while ZFS is mature and reliable today.
* Yes, they're a large corporation and right hand doesn't know what left hand does... but isn't this more like the index finger not knowing what the middle finger is doing?

--
Do you even lift?
These aren't the 'roids you're looking for.
Re:The oracle in the woodpile by h4rr4r · 2013-04-26 02:45 · Score: 1

Oracle does not want to do that. I am sure btrfs exists only because the DB boys want a new good filesystem and the Solaris org chart heads will not let ZFS slip from their grasp.
ZFS is mature and reliable today on Solaris and BSD, the Linux port is far newer.
Re:The oracle in the woodpile by h4rr4r · 2013-04-26 02:56 · Score: 1

No, I cannot.
I have a many servers that run commercial software only supported on RHEL.
Re:The oracle in the woodpile by bill_mcgonigle · 2013-04-26 03:13 · Score: 2

It's Oracle. They'll re-license ZFS just as soon as it's no longer profitable for them not to.

--
My God, it's Full of Source!
OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
Re:The oracle in the woodpile by fsterman · 2013-04-26 03:23 · Score: 1

Yes, they're a large corporation and right hand doesn't know what left hand does... but isn't this more like the index finger not knowing what the middle finger is doing?
I am quite sure that Larry Ellison knows *exactly* what his middle finger is doing.

“But Steve, there’s one thing I don’t understand,” he said. “If we don’t buy the company, how can we make any money?” It was a reminder of how different their desires were. Jobs put his hand on Ellison’s left shoulder, pulled him so close that their noses almost touched, and said, “Larry, this is why it’s really important that I’m your friend. You don’t need any more money.”
Ellison recalled that his own answer was almost a whine: “Well, I may not need the money, but why should some fund manager at Fidelity get the money? Why should someone else get it? Why shouldn’t it be us?”

--
Is there anything better than clicking through Microsoft ads on Slashdot?
Re:The oracle in the woodpile by phoenix_rizzen · 2013-04-26 04:59 · Score: 1

A lot of Btrfs development comes from RedHat. And there are other sources of patches as well. Plus, the code is part of the Linux kernel, meaning Oracle can't close-source it.
IOW, there's nothing to worry about here.
Re:The oracle in the woodpile by Lennie · 2013-04-26 05:28 · Score: 1

Chris Mason and a fellow btrfs-developer both work at Fusion-io since somewhere around June last year.

--
New things are always on the horizon

Re:replace ext3 and ext4? really? by h4rr4r · 2013-04-26 02:21 · Score: 3, Informative

Lots of production servers user Ext filesystems. If btrfs is all it should be it will certainly replace these file systems one day soon as the safe choice.

Sure people use other filesystems on production Linux servers, but those are not the norm. The safe "Enterprise" (Not necessarily a good thing) choice is still Ext based filesystems.

ZFS by 0100010001010011 · 2013-04-26 02:21 · Score: 5, Informative

Meanwhile ZFS announced that it was ready for production last month.

http://zfsonlinux.org/

Re:ZFS by h4rr4r · 2013-04-26 02:26 · Score: 4, Insightful

It will be ready for production when it can be distributed with the kernel.
Do you really want to depend on an out of tree FS?
Re:ZFS by Bill_the_Engineer · 2013-04-26 02:30 · Score: 3, Interesting

Incompatible license prevents ZFS inclusion with the kernel. This is why Btrfs exists and explains Oracle's involvement with both.

--
These comments are my own and do not necessarily reflect the views or opinions of my employer or colleagues...
Re:ZFS by h4rr4r · 2013-04-26 02:33 · Score: 4, Insightful

Correct sir.
My point still stands though. Even though the limitation keeping it from being seriously considered for production is caused by a legal issue not a technical one.
Re:ZFS by Oceanplexian · 2013-04-26 02:35 · Score: 2

It will be ready for production when it can be distributed with the kernel.
ZFS is not included in the Linux kernel because it is not GPL compatible.
Licensing has nothing to do with how production-ready a product is. ZFS is significantly more mature than btrfs.
Re:ZFS by Anonymous Coward · 2013-04-26 02:39 · Score: 2, Insightful

It will be ready for production when it can be distributed with the kernel.
Do you really want to depend on an out of tree FS?
That's why the fileserver runs FreeBSD. Has other benefits, too.
Re:ZFS by h4rr4r · 2013-04-26 02:40 · Score: 2

Yes, but the statement is still true.
It means you will not get updates via normal channels, or normal channel updates might break it. That simply is not something most datacenters want to deal with. ZFS is more mature on Solaris and BSD, on Linux today it might be ahead of btrfs, but neither is production ready in the sense that datacenters mean it.
Re:ZFS by Chris+Mattern · 2013-04-26 02:44 · Score: 3, Interesting

Mixing licenses does not somehow make things "not production ready".
No, using a file system that doesn't ship with the kernel makes things "not production ready." Licensing is the reason why it doesn't ship with the kernel, but it's not shipping with the kernel that keeps it out of critical production use.
Re:ZFS by h4rr4r · 2013-04-26 02:46 · Score: 2

This.
The reason they can't ship together and be updated via normal RHEL/SUSE/Debian updates is licensing, but the technical problem keeping it from being seriously considered for production is that they can't be updated and shipped together.
Re:ZFS by Chris+Mattern · 2013-04-26 02:47 · Score: 1

The reason *why* ZFS doesn't ship with the kernel is mostly irrelevant. The fact remains--in order to use ZFS in Linux, you have to roll your own custom system. This is not a good thing for production.
Re:ZFS by Guspaz · 2013-04-26 03:02 · Score: 1

People keep saying stuff like this, but it's just FUD. zfsonlinux exists as a kernel module, this isn't zfs-fuse anymore. Installing it on a common distro like Debian that doesn't include it via the package management system requires two commands (add repo, install package). Some distributions like Gentoo already include zfsonlinux as part of the distro, and this will undoubtedly increase as time goes on.
There are no more legal or technical problems with zfsonlinux than something like the nVidia drivers. Less, in fact, since zfsonlinux *is* distributed under a free software license.
Re:ZFS by Guspaz · 2013-04-26 03:05 · Score: 1

No, the statement is false. There are no licensing issues to including the zfsonlinux kernel module with distros. The precedent on kernel module licensing has been long set by things like nVidia drivers, and zfs uses a free software license that enables distribution. Some distros like Gentoo already do include zfsonlinux, and I imagine more will in the future. On these distros, you WILL get updates via normal channels.
If you define "distributed with the kernel" to say "this distribution includes both the kernel and zfsonlinux", then yes, zfsonlinux can and IS already distributed with the kernel.
Re:ZFS by h4rr4r · 2013-04-26 03:14 · Score: 1

That first command is your problem.
Not in the normal repos not getting installed.
No one installs the closed nVidia drivers on production machines.
Re:ZFS by h4rr4r · 2013-04-26 03:15 · Score: 1

Are any of these Enterprise distros?
I don't know of any of those that distribute any of the kernel modules are speaking of.
Gentoo is linux for ricers.
Re:ZFS by Cid+Highwind · 2013-04-26 03:38 · Score: 1

You missed two steps, the full process is:
1: Add non-standard repo
2: Kiss distro maintainer support goodbye.
3: Install package
4: Kiss kernel developer support goodbye (kernel tainted: disabling lock debugging)

--
0 1 - just my two bits
Re:ZFS by ultranova · 2013-04-26 03:41 · Score: 1

It means you will not get updates via normal channels, or normal channel updates might break it. That simply is not something most datacenters want to deal with.

But it's something datacenters will have to deal with anyway. After all, there's no guarantee that any update won't break something, so they'll need an internal update server that only gets vetted and tested updates - and at that point it's not much of a bother to include out-of-tree patches, assuming of course that they give a significant advantage.

--
Forget magic. Any technology distinguishable from divine power is insufficiently advanced.
Re:ZFS by Guspaz · 2013-04-26 03:45 · Score: 1

It's in the normal Gentoo repos. I recall another distro it was in, but I don't remember the name (started with an S?) As it continues to mature, I find it likely that we'll see it included in more distros.
Re:ZFS by Guspaz · 2013-04-26 03:49 · Score: 1

Gentoo today, who knows what else tomorrow. I'm not a fan of Gentoo, but the fact ZFS is being included in any distros show that claims that licensing prevents distro inclusion are FUD. You can probably make a bunch of legitimate arguments about being out of tree, or kernel taint when you load it, or who knows what else, but ability to distribute isn't one of the problems.
Re:ZFS by h4rr4r · 2013-04-26 04:04 · Score: 1

Or that Gentoo is making a big mistake.
I would be happy to see it in Debian, that would get rid of any doubt I had.
Re:ZFS by h4rr4r · 2013-04-26 04:06 · Score: 1

Sure, but the odds of breakage are lower and you don't lose support in that case anyway.
Tell RH support you are using a non-supported FS and watch them hang up on you.
Re:ZFS by GameboyRMH · 2013-04-26 04:13 · Score: 1

May have to settle for this, I really need a modern filesystem that supports deduplication and my experiments with btrfs is early 2012 didn't go so well:
http://slashdot.org/journal/285321/my-btrfs-dedupe-script

--
"When information is power, privacy is freedom" - Jah-Wren Ryel
Re:ZFS by 101percent · 2013-04-26 04:35 · Score: 1

All good news, but ZFS is soon going to hit a ceiling. Oracle has tightened up the license for their ZFS, and who knows if the open source version is ever going to have those features.
Re:ZFS by h4rr4r · 2013-04-26 04:48 · Score: 1

No one said that.
No matter how automated, you install an outside FS and your Redhat support is dead as a doornail.
Re:ZFS by Guspaz · 2013-04-26 04:49 · Score: 1

I agree. I've got Ubuntu Server on my box at home (I like Debian, but I also like a fixed release/support schedule with LTS releases), and I'd be happy to see Debian or Ubuntu get ZFS in one of the official distros.
Debian can be... particular. The CDDL, however, does seem to be considered to be DFSG compatible, and there seems to be CDDL licensed code in the main Debian repo. It looks like all the debate on the subject happened in 2005-2006, and then nothing, so the fact that there seems to be CDDL license notices in the main repo indicates to me that the matter was settled in favour of the CDDL being compliant.
Re:ZFS by DamnStupidElf · 2013-04-26 04:53 · Score: 1

I'm not aware of anything preventing a clean-room reimplementation of ZFS licensed under GPLv2. It worked for nouveau.
Re:ZFS by Bill_the_Engineer · 2013-04-26 05:04 · Score: 1

Having it included in the kernel would make it production ready on a wider range of distributions but it doesn't prevent it from being "Production Ready". It can be "Production Ready" for RHEL 6 and derivatives. It just requires a little more work from the administrator. This is not as big an issue as you make it out to be since other things outside of the kernel must be considered anytime an update needs to be made within a production environment.

--
These comments are my own and do not necessarily reflect the views or opinions of my employer or colleagues...
Re:ZFS by Bill_the_Engineer · 2013-04-26 05:07 · Score: 1

I would like to add that having the file system included in the kernel does give it more weight when deciding which file system to use.

--
These comments are my own and do not necessarily reflect the views or opinions of my employer or colleagues...
Re:ZFS by wagnerrp · 2013-04-26 06:25 · Score: 3, Insightful

Anyone using nVidia GPUs for compute cards in a data center is using the closed nVidia drivers. Anyone not using them for that purpose likely doesn't even have any nVidia hardware in the first place.
Re:ZFS by wagnerrp · 2013-04-26 06:27 · Score: 2

All that means is ZFS gets forked, and FreeBSD, OpenIndiana, Nexenta, or one of the other Solaris clones takes primary ownership.
Re:ZFS by Solandri · 2013-04-26 06:45 · Score: 1

Deduplication is a neat concept in theory, but I would strongly recommend anyone hoping to use it to run some benchmarks first. In my tests of ZFS in Linux and FreeBSD, filesystem write performance dropped by 75%-90% when I turned deduplication on. I was using hefty hardware too (3.1 GHz i5 quad core, 8 GB of RAM).

With that sort of performance penalty, I can think of very few cases where you'd actually want to use deduplication. In most cases you'd be better off just buying more hard drives to increase storage space. Or maybe the Solaris version of ZFS performs much better.
Re:ZFS by KiloByte · 2013-04-26 06:47 · Score: 1

A filesystem that takes most of the machine's memory might be usable on a NAS, but not on a system whose purpose is something else than being a dedicated file server.

--
The creatures outside looked from Alt-Right to Antifa; but already it was impossible to say which was which.
Re:ZFS by GameboyRMH · 2013-04-26 06:51 · Score: 1

Is that file-level or block-level that caused that performance drop? I'm looking at using file-level with copy-on-write links.

--
"When information is power, privacy is freedom" - Jah-Wren Ryel
Re:ZFS by jafo · 2013-04-26 06:59 · Score: 1

Please explain it to me, because I really don't see any reason not to rely on an "out of tree FS". My system won't boot without tons of stuff that is outside of the kernel tree, including things like init but also things like graphics drivers on my desktop.
It seems to me that the ZFS license issue is only with the kernel, and can be solved by distributors. Distributions deal with wrapping up things under multiple licenses *ALL THE TIME*. And Ubuntu seems to be pretty close to having this integration done, based on what a friend reported with his experiments with zfsonlinux as a root device.
With all due respect to those involved, I think the pronouncement that it must be in the kernel and that it must be in the kernel, and that it is a "rampant layering violation" have set Linux back a long ways. FreeBSD, DragonFly BSD, OpenSolaris, have all had "advanced filesystems" for years now. Linux is basically stuck with a feature-set from Berkeley FFS and isn't really showing that that is going to change for several years... It's kind of a shame, especially since at the time of the "layering violation" comment it was clear to me that the violation came with significant compelling reasons for it, and now btrfs seems to be realizing that and implementing the same features...
Hindsight and all that, but it's a damn shame. ZFS is insanely awesome, I have a number of systems running it under FUSE and it has proven very reliable over the years.
Re:ZFS by idunham · 2013-04-26 07:09 · Score: 1

> Are any of these Enterprise distros?
> I don't know of any of those that distribute any of the kernel modules are speaking of.
http://ftp.scientificlinux.org/linux/scientific/6rolling/x86_64/addons/zfs/
Now you do.
Re:ZFS by anyanka · 2013-04-26 07:09 · Score: 2

No one installs the closed nVidia drivers on production machines.
Depends on what you're producing. But yeah, I'd avoid it if not strictly necessary.
Re:ZFS by cstdenis · 2013-04-26 08:26 · Score: 1

ZFS is perfectly capable of being distributed with the kernel, and in fact is distributed with the Solaris and FreeBSD kernels.
The only thing stopping it being distributed with the Linux kernel is the Linux kernel's licence choice which is not the fault of the ZFS team.

--
1984 was not supposed to be an instruction manual.
Re:ZFS by washu_k · 2013-04-26 09:18 · Score: 1

You didn't have enough RAM. To use deduplication on ZFS without a massive performance hit requires assloads of RAM. 8 GB is nothing to ZFS with dedup on unless your disks are tiny. While Oracle claims less, the FreeBSD guys have found you need at least 5 GB per TB of disk just for dedup, plus more for cache and the rest of the OS. Do the math and any reasonably big storage pool will need tonnes of RAM.
Re:ZFS by TyFoN · 2013-04-26 10:09 · Score: 1

Works fine here.
I have zfs root + refind on my laptop and grub2 + zfs root on my desktop with ssd cache drives. Works like a charm. It eats a bit of ram though :)
Just added a couple repositories to arch and build a new usb stick with zfs included.
Instructions here:
https://wiki.archlinux.org/index.php/Installing_Arch_Linux_on_ZFS
The only backdraw is that I have to wait for the zfs repository to update whenever arch releases a new kernel. Usually it takes 1-2 days so it's not too bad.
Also, keep in mind that you can not offload the ZIL to another (ssd) drive for the root zpool so make sure you make a separate pool for the root and one with all the rest /usr, /home, /var etc.
You can add cache ssd drives to the root pool though.
Offloading ZIL (log) speeds up the writes and cache drives speeds up the reads.
Once you go zfs you don't go back as the possibilities are endless ;)
Re:ZFS by phoenix_rizzen · 2013-04-26 11:31 · Score: 1

Maybe not on servers. But we install the nVidia binary drivers on approx 3000 Debian-based diskless clients in our district. Have been for almost a decade now.
I love how everytime someone "solves" your "not ready for production" issue you dream up a new one.
Look, if you, specifically, don't want to install ZFS on your particular Linux systems doesn't mean it's not "production ready" for the rest of us.
Re:ZFS by phoenix_rizzen · 2013-04-26 11:36 · Score: 1

It's nowhere near "5 GB of ARC per 1 TB of disk". That meme really needs to stop.
We have 48 GB of RAM, 40 GB reserved for ARC, no L2ARC, running with dedupe enabled on a server with just under 30 TB of raw storage, 22 TB in use. Runs fine. It does rsync backups every night of 60-odd remote servers. Starts at 5pm, ends around 2am. Then does a ZFS send to an off-site system in just over an hour. Only has 15 disks in 3 raidz2 vdevs.
The other backups boxes have 64 GB of RAM, but just under 50 TB of disk.
The off-site box has only 128 GB of RAM for just shy of 100 TB of disk.
The correct math is approx 1 GB of RAM per unique TB of data. The more duplicated data you have, the less ARC space you need to hold the dedup tables.
Re:ZFS by phoenix_rizzen · 2013-04-26 11:39 · Score: 1

ZFS was forked years ago with ZFSv28.
Oracle ZFS is ZFSv33 or thereabouts.
Open-source ZFS is mainly developed by Illumos, with contributions from FreeBSD, Linux, Joyent, Nexenta, and many others.
Many new features have hit OSS ZFS that aren't in Oracle ZFS:
- LZ4 compression
- feature flags
- delayed deletion of snapshots
- many many many bug fixes
- many many many optimisations
- bunch of other stuff I can't recall
If you use OSS ZFS with feature flags enabled, the version number shows as 5000.
Re:ZFS by cas2000 · 2013-04-26 14:04 · Score: 1

debian already distributes zfs-fuse in the main archive.
there's no legal impediment to debian also distributing zfsonlinux as zfs-dkms and spl-dkms kernel module packages (i.e. compiles and installs the .ko modules when you install it) and zfs tools packages.
in fact, that's how the zfsonlinux project distributes for debian - as an apt-gettable repository, so installing it is as easy as adding another repo and running apt-get
http://zfsonlinux.org/debian.html
(BTW, that repo works on Wheezy and on Sid)
In both cases, zfs-fuse and zfsonlinux, they're not distributing a derived work because it is the USER who installs it on their own system who is combining the GPL code and the CDDL code. As long as they don't distribute the result themselves, the GPL is fine with that.
You can do whatever you like (including link incompatibly-licensed, even proprietary, code) with GPLed code on your own system. The GPL's restrictions only come into effect when you want to distribute the combined work (or, in the case of GPLv3, if you want to offer it as SaaS to third-parties)
Re:ZFS by Fweeky · 2013-04-27 09:46 · Score: 1

8GB isn't hefty by any stretch of the imagination, especially not when you're messing with dedup. For decent performance the recommendation is somewhere along the lines of 20-30GB per TB, though you can mitigate that somewhat by using an SSD for L2ARC.
Re:ZFS by Chris+Mattern · 2013-04-28 02:42 · Score: 1

Maybe not on servers. But we install the nVidia binary drivers on approx 3000 Debian-based diskless clients in our district. Have been for almost a decade now.
Apples and oranges. You can do things on workstations you can't possibly do on production servers. When a workstation goes down, one person can't work. When a production server goes down, it's possible that *hundreds* of people can't work.
Re:ZFS by phoenix_rizzen · 2013-04-29 06:12 · Score: 1

ZFS uses as much (or as little) RAM as you decide to let it use. The ARC is fully tunable.
By default, it uses "all but 1 MB" of RAM, but releases RAM as needed when the OS tells it to. However, you can set a max as low as you want (although you can't disable it).
Have 64 GB of RAM, but need 32 GB for the DB? Then set the arc_max tunable to 32 GB (or even less).
Re:ZFS by KiloByte · 2013-04-29 08:50 · Score: 1

Quoting the ZFS Guide:

To use ZFS, at least 1 GB of memory is recommended (for all architectures) but more is helpful as ZFS needs *lots* of memory. Depending on your workload, it may be possible to use ZFS on systems with less memory, but it requires careful tuning to avoid panics from memory exhaustion in the kernel.
Yeah... 1GB memory just to run it. I'll pass. Somehow ext4 or btrfs have no problems running efficiently on a phone.

--
The creatures outside looked from Alt-Right to Antifa; but already it was impossible to say which was which.
Re:ZFS by phoenix_rizzen · 2013-04-29 10:07 · Score: 1

Recommended, not required. There are plenty of folks running FreeBSD + ZFS on systems with 1 GB of RAM total, as well of plenty of folk running it on laptops/netbooks (slow disk, not a lot of RAM, 32-bit CPUs, etc).
Have a look in top output sometime. See all that RAM being used as "cache" and "buffer", that's your filesystem using up system RAM. I've seen it get into the multiple GBs, yet no one complains about that.
Yet, that is the equivalent of the ARC in ZFS. And, when the OS needs RAM for application usage, data in "cache" and "buffer" is flushed and used for the apps. Same thing on ZFS-using systems: if the OS needs RAM for apps, the ARC releases it.
Not really sure why this is an issue for people. At least ZFS lets you tune things, whereas you can't tune the amount of RAM used for "cache" or "buffer" in a 'normal' setup.
Re:ZFS by KiloByte · 2013-04-29 10:27 · Score: 1

The page cache is directly usable, ARC is not. There's no "buffer" in a 'normal' setup -- all of memory serves as a LRU-ish cache of recently accessed pages.

--
The creatures outside looked from Alt-Right to Antifa; but already it was impossible to say which was which.
Re:ZFS by phoenix_rizzen · 2013-05-03 06:41 · Score: 1

If the system needs RAM, it's released from the ARC and made available to the OS/apps.
If the system needs RAM, it's released from the page cache and made available to the OS/apps.
How's it any different?
[i]There's no "buffer" in a 'normal' setup[/i]
Oh? Then that "buffer" stat for RAM in top output must be magical and not really exist on every Linux system out there?
Re:ZFS by KiloByte · 2013-05-03 08:29 · Score: 1

A page in the page cache is directly usable by any program; it might be at most not currently mapped by any running process.

--
The creatures outside looked from Alt-Right to Antifa; but already it was impossible to say which was which.
Re:ZFS by Tony+Hoyle · 2013-05-04 07:25 · Score: 1

You don't install outside the supported distro you paid support for. Anything outside that isn't production ready. Unless zfs ships as a supported addon with RHEL it's not production ready in any meaningful sense of the word - stability has nothing to do with it.

Re:hype hype hype by amiga3D · 2013-04-26 02:27 · Score: 1

Isn't that a common trait with experimental systems?

Sorry Slashdot. by Anonymous Coward · 2013-04-26 02:31 · Score: 5, Funny

Ugh, I'm really sorry about this post, Slashdot. I really didn't think it was going to a "First post." What I really meant to post was

OMFG fr1st psot!!!! APK!! crazy host file conspiracy! /etc/mod_me_down

Re:Sorry Slashdot. by greg1104 · 2013-04-27 08:05 · Score: 1

I couldn't find /etc/mod_me_down on my machine. Is that an Ubuntu thing?
It used to be, but Canonical has replaced it with something that just reports all the moderation you do locally back to them.
Re:Sorry Slashdot. by crutchy · 2013-04-28 19:24 · Score: 1

sudo apt-get install slashdot-first-post
on another note... who named it btrfs? it kinda looks like ButtFS. aren't we supposed to be in the age of coolness? why not call it SlickFS or OMFGFS, or given that it's for linux... RTFMFS

Re:Why? by h4rr4r · 2013-04-26 02:37 · Score: 5, Insightful

ZFS is outside the kernel tree. That is not an ideological issue, but a practical one. It means updates will not come from the normal channels, it means kernel updates form normal channels could break it and it is not getting the attention from the kernel devs an fs should get.

ZFS on linux has probably less testing than Btrfs at this point. It has near no real world testing. Just because the Solaris ZFS is great, and the BSD one is coming along means nothing for the stability and correctness of the Linux port.

If you want to use a different OS than this entire discussion is worthless. You might as well suggest switching everything to OSX and using HFS+.

take good note of distros that treat it as stable" by iggymanz · 2013-04-26 02:39 · Score: 1

Those distros such as SuSE Linux Enterprise Server, that claim it was production ready and have it in the install, should be shunned. Don't entrust your data to them

Re:A Few Nasty Caveats? by Cito · 2013-04-26 02:45 · Score: 2

yea Btrfs has one major bug

if you fill the hard drive up you lose access to the system, you can't log in or even get access to the filesystem and the system locks up

with ext things may act a bit erratic but you could log in and delete/move things off to make room and be ok. but Btrfs you can't if it fills up you lose

unless you take the hard drive out move it to another box and mount it then delete crap that way, but that's a pain in arse.

Re:Yawn, yet another filesystem... by h4rr4r · 2013-04-26 02:50 · Score: 5, Insightful

Ext3 is still chugging along and doing what you want. A filesystem that sacrifices everything for stability.

Not everyone has the same wants and needs. Lots of competing filesystems is a good thing, it leads to a market of ideas. Your lets pick one and force everyone to suffer with our choice just leads to stagnation and even worse results.

Full limit by amginenigma · 2013-04-26 02:54 · Score: 1

So what is this 'Full' limit? In the ZFS world it's accepted to keep the pool (volume) under 80% usage to prevent issues. Is this something that should be applied as a 'best practice' to btrfs?

Re:Full limit by fsterman · 2013-04-26 03:28 · Score: 1

The last I checked (which was a very long time ago) Linux/ext required a logical swap partition for paging. Why not just preallocate some space in BTFS and be done with it?

--
Is there anything better than clicking through Microsoft ads on Slashdot?

Re:Yawn, yet another filesystem... by Anonymous Coward · 2013-04-26 02:56 · Score: 1

Is part of the open source karma. "Shiny and New" is much more important than "stable, bug-free and usable"

You don't understand the problem by Anonymous Coward · 2013-04-26 03:08 · Score: 2, Interesting

The problem with "XFS" eating data wasn't with XFS - it was with the Linux devmapper ignoring filesystem barrier requests.

Gotta love this code:

Martin Steigerwald wrote:
> Hello!
>
> Are write barriers over device mapper supported or not?

Nope.

see dm_request(): /*
* There is no use in forwarding any barrier request since we can't
* guarantee it is (or can be) handled by the targets correctly.
*/
if (unlikely(bio_barrier(bio))) {
bio_endio(bio, -EOPNOTSUPP);
return 0;
}

Who's the clown who thought THAT was acceptable? WHAT. THE. FUCK?!?!?!

And it wasn't just devmapper that had such a childish attitude towards file system barriers:

Andrew Morton's response tells a lot about why this default is set the way it is:

Last time this came up lots of workloads slowed down by 30% so I dropped the patches in horror. I just don't think we can quietly go and slow everyone's machines down by this much...

There are no happy solutions here, and I'm inclined to let this dog remain asleep and continue to leave it up to distributors to decide what their default should be.

So barriers are disabled by default because they have a serious impact on performance. And, beyond that, the fact is that people get away with running their filesystems without using barriers. Reports of ext3 filesystem corruption are few and far between.

It turns out that the "getting away with it" factor is not just luck. Ted Ts'o explains what's going on: the journal on ext3/ext4 filesystems is normally contiguous on the physical media. The filesystem code tries to create it that way, and, since the journal is normally created at the same time as the filesystem itself, contiguous space is easy to come by. Keeping the journal together will be good for performance, but it also helps to prevent reordering. In normal usage, the commit record will land on the block just after the rest of the journal data, so there is no reason for the drive to reorder things. The commit record will naturally be written just after all of the other journal log data has made it to the media.

I love that italicized part. "OMG! Data integrity causes a performance hit! Screw data integerity! We won't be able to brag that we're faster than Solaris!"

There's a lot more out there if you care to look.

Toss in other things like the way Linux handles NFSv2 group membership (More than 16? Let's just silently drop some!) and lots of fanbois wonder why I view Linux as little better than Windows. Hell, Microsoft may fuck things up six ways from Sunday, but they're not CHILDISH when it comes to things like data integrity.

Re:You don't understand the problem by h4rr4r · 2013-04-26 03:25 · Score: 1

Data integrity is fine, if you are not running XFS.
Why should everyone suffer a 30% performance hit, to make the couple oddballs running XFS happy?
Re:You don't understand the problem by TheDarkMaster · 2013-04-26 05:20 · Score: 1

Maybe you should pay more attention in the passage where the guy talks about the issue of data integrity and how bad is the way that the kernel handles the situation ("we do the fast way and hope for the best").

--
Religion: The greatest weapon of mass destruction of all time
Re:You don't understand the problem by Anonymous Coward · 2013-04-26 07:57 · Score: 1

Check your facts, that should be the worst case scenario of creating thousands of small fies. In real life, especially after XFS got the performance improvements around 3.4(?), ext4 is the slow one due to JBD2 doing stupid things every few seconds and slowing things down all the time and that hurts both performance and laptop battery more than waiting 20s to unpack a kernel tarball in case of XFS. And don't even try to lie your way out, JBD2 sucking is well known fact and is either being removed in 3.9 or being improved a lot but I would wait till benchmarks before claiming it got massively faster.

Re:Yawn, yet another filesystem... by jabuzz · 2013-04-26 03:09 · Score: 1

The primary reason for the existence of ext4 is Lustre. By far the best option for a general purpose none clustered Linux file system is XFS by some considerable distance. The crying shame is that RedHat did not make a grab for CXFS out the ruins of SGI but persisted with GFS2 and then purchased Glustre.

Re:replace ext3 and ext4? really? by jabuzz · 2013-04-26 03:12 · Score: 2

Want more than 16TB on your server? Unless ext4 has very recently grown that support then using an ext based file system is not viable. Remember a RAID5 in 4D+P using 4TB disks will be super close to that 16TB limit. Better hope that you don't want to scale the file system up in the future.

Re:replace ext3 and ext4? really? by h4rr4r · 2013-04-26 03:20 · Score: 1

Friends don't let friends use RAID5.

As far as I know there are no 4TB SAS drives available yet.

This is another reason why people want btrfs soon. Right now it is not yet an issue, for most use cases. Since you can have many 16TB volumes.

Best Tech Thread Evar!! by interval1066 · 2013-04-26 03:21 · Score: 1

Actually I'm being serious. This is why I come to /.

--
Python: 'And then suddenly you have a language which says "we're all stuck with whatever the whiniest coder wants".'

Re:Still use reiserfs by rilles · 2013-04-26 03:21 · Score: 1

My unraid system still sticks with reiserfs. Been using it for years with no obvious issues seen, the unraid application author still seems to prefer it over any other FS.

Re:Yawn, yet another filesystem... by h4rr4r · 2013-04-26 03:22 · Score: 1

They fix that write barrier issue yet?

Don't tell me it is a Linux bug, that is a cop out. Either it will lose my data or it will not, I don't care why.

Re:Yawn, yet another filesystem... by interval1066 · 2013-04-26 03:25 · Score: 1

Linux DOES have a good file system. As xkcd says; it has 15 of 'em. This is one of the highlights of Linux, you can use any fs available. The bad side is simply many of them are old, a few are too new, and most don't have ALL the features enterprise's need. This is a good thing though. When Apple or Microsoft delcare a critical system infrastructure feature complete, and its not for you, what is your recourse? At least with Linux if something new is really needed it can be added. The possibility is there.

--
Python: 'And then suddenly you have a language which says "we're all stuck with whatever the whiniest coder wants".'

Re:replace ext3 and ext4? really? by Anonymous Coward · 2013-04-26 03:31 · Score: 2, Interesting

FYI, ext4 can be larger than 16 TB but you need a newer version of the e2fsprogs than is included in a typical enterprise distribution. It's not the kernel filesystem drivers with the limitation, but the user-level utility for formatting a new filesystem.

I still prefer XFS by Damouze · 2013-04-26 03:36 · Score: 1

I still prefer XFS ;-).

--
And on the Eighth Day, Man created God.

Re:Yawn, yet another filesystem... by bored · 2013-04-26 03:41 · Score: 2

Ext3 is still chugging along and doing what you want. A filesystem that sacrifices everything for stability.

EXT3, is actually fairly good, and the performance isn't bad _EXCEPT_ for one issue. fsync(), which causes a massive IO barrier against all the other operations in the filesystem. fsync() should only be assuring the named file is consistent, and yet it basically stalls the entire FS to assure that one file. Its a problem with lack of proper IO tagging and actually is a fundamental problem with the block layer in linux. A recent LSML posting about SYNCHRONIZE CACHE hints at the problem too (complete device flush when only a small portion of the IO needs to be flushed).

Re:Yawn, yet another filesystem... by 0123456 · 2013-04-26 03:48 · Score: 1

99% of software doesn't need to call fsync() on a sanely designed filesystem. The most likely problem is software which calls fsync() regularly to work around ext4 retardeness then being run on ext3, or apps which use libraries like sqlite that call fsync() multiple times when updating the database.

Certainly when I manually sync on my CentOS machine it takes several seconds to complete the writes to disk, so clearly the software I run there isn't calling fsync() much.

My experiance, for what it worth... by sshir · 2013-04-26 03:48 · Score: 2

Installed Xubuntu 12.10 last October(ish) on USB2 stick (jetflash 32G) with Btrfs (only /boot had EXT2 partition, no swap)

Reason: 24/7 machine. It's a notebook - always spinning harddrive is a drag: spins up cooling fun; so I went solid state for primary OS drive.Needed filesystem that spreads wear and does checksums - hence Btrfs.

Usage - downloading stuff (to the stick itself, not the harddrive) plus some NASing. Data volume: wrapped around those 32gigs few times already.

Observations so far: no problems at all.

Other details: Had to play with I/O scheduler (I think settled on CFQ. Interestingly, NOOP sucked). Had to install hdidle (I think) otherwise couldn't force sda to go to sleep (bug (?)).

Re:My experiance, for what it worth... by gottabeme · 2013-04-26 04:35 · Score: 1

Curious, why did you have to play with the I/O scheduler?

--
"Those who consume the bulk of goods are those who make them. We must never forget this secret of our prosperity."
Re:My experiance, for what it worth... by sshir · 2013-04-26 05:05 · Score: 1

System would start to stutter when a lot of I/O is happening. Larger "transaction sizes" are needed to mitigate that. Usually an indication of a high latency environment.

But I think in my case that's mostly due to USB and not btrfs.
Re:My experiance, for what it worth... by gottabeme · 2013-05-03 07:58 · Score: 1

That's interesting. There are lots of people complaining about USB I/O on Linux (e.g. https://bugzilla.kernel.org/show_bug.cgi?id=12309 ), and I've experienced it myself. And many also say that CFQ is quite poor at I/O, especially when it comes to latency, so I'm really surprised that you found it to be the best. Did you try deadline?
I sure hope BFQ becomes the default someday. Blows everything else out of the water as far as latency goes, and sometimes throughput too.

--
"Those who consume the bulk of goods are those who make them. We must never forget this secret of our prosperity."
Re:My experiance, for what it worth... by sshir · 2013-05-05 11:42 · Score: 1

I did try deadline. Subjectively it was worse. But there are too many things conflated - number (and kind) of disk operations depends on the filesystem and its parameters, flash drives are often (more as a rule) funky, usage profiles are different and god knows what else...

Re:It's completely ideological. by 0123456 · 2013-04-26 03:51 · Score: 1

'Licensing crap' is legal, not ideological.

bad joke... by Anonymous Coward · 2013-04-26 03:55 · Score: 1

I've just lost ( an our ago ) the entire FS tree in archlinux installation ( I forgot to prepare meself with the btrfsprogs ) - I thought I was safe to use btrfs - I wasn't even able to boot ubuntu-live (13.04) because [the screwed] btrfs partitions made the kernel (btrfs module from the Ubuntu live) crash at boot :-)

- destroy offeding partition
- restart from scratch but NEVER-EVER use btrfs again!

I have my lesson - don't touch anythings you do NOT know very well.
Therefore, it is a pleasure for me to re-install archlinux ( using ext4 this time!!! ) again :-)

Long live archlinux :-)
hahahaha

Re:bad joke... by game+kid · 2013-04-26 04:20 · Score: 1

I use Arch, and decided to stick with ext4 for now--I don't want a filesystem that bleeding-edge, btrfs does not yet directly support swap files, and though my laptop has enough memory to fit a container ship or two (16 gibs, good god...) I'd still rather have one, both for resume and general swappy stuff if ever needed.
Personally I'd like file birth time support (and not just for btrfs) to jog my memory about the times I've made old stuff, but it seems I'd have to move to e.g. FreeBSD for that and I prefer the more-up-to-date-iness of Arch.

--
You can hold down the "B" button for continuous firing.
Re:bad joke... by Aviancer · 2013-04-26 04:52 · Score: 1

I have my lesson - don't touch anythings you do NOT know very well.
Because this is a fantastic way to get to be an expert on new stuff.
Re:bad joke... by Harik · 2013-04-26 05:27 · Score: 1

File birth time is a fairly difficult concept, and only really useful on say a database file that's edited in-place. Any text file/source code you've written will have btime=ctime, since it was 'created' as a temporary file, then renamed over the original. That's one reason why people think ctime means 'creation' time, since for the types of files people hand-edit it really is.
Re:bad joke... by Lennie · 2013-04-26 05:35 · Score: 1

What is wrong with swap-partitions ?

--
New things are always on the horizon

Re:Sigh by bored · 2013-04-26 03:57 · Score: 1

Well, if linux is a toy (your basic argument) then why are all the subsystem maintainers paid by large companies a salary same as the developers at Microsoft, or any other OS company?

My point doesn't preclude people showing up and writing the next great filesystem. Its simply a question of why everyone thinks its a good idea for a guy PAID to maintain a filesystem to drop it and go write another one. If you worked for _BIG_ company and were paid to maintain their application, and you decided one day that maintaining their application was a PITA cause it was old crufty and not sexy anymore and instead refused to fix problems in it, rather spending the next 4 years writing a replacement (complete with another set of bugs) how long do you think your job would last?

Of course, this stuff happens in a lot of software projects, new developer shows up, and writes buggy new system cause they think they are smarter than the last guy. It frankly speaks of immaturity and an "artist" mentality rather than an engineering process. Sure, software isn't all engineering but linux is an OS, its a fundamental part of a computing platform and one that is expected to provide some basic level of service to applications (you know the things actually doing the work). When it fails at that, because its a patchwork of art, then you have to question why.

Re:replace ext3 and ext4? really? by h4rr4r · 2013-04-26 03:58 · Score: 1

How about Dell or HP or someone like that?

Neat to see they are coming along though. Not too bad to get 16TiB for $3500. Since you need 8 of them to get a RAID10 that size.

Re:Yawn, yet another filesystem... by bored · 2013-04-26 04:05 · Score: 1

99% of software doesn't need to call fsync() on a sanely designed filesystem.

Really? please, show me the part of POSIX which says the data you wrote has now been flushed to the medium and you can respond with 100% certainty, to the user, or API making a request that if power fails this transaction will be safe.

po-tay-to vs. po-tah-to by DragonWriter · 2013-04-26 04:31 · Score: 1

But I would have preferred if it was worded as here are some edge case bugs that need fixing before BTRFS is used in our scenario, rather than that these were show stoppers...

I don't see how that's any different: "Show stoppers" means "things that are unacceptable in our scenario".

Re:It's completely ideological. by UnknownSoldier · 2013-04-26 04:47 · Score: 4, Interesting

Please mod parent informative.

One of the retarded things about btrfs is that you can not see how much disk space is being used by each subvolume. How the hell can you have a filesystem and not know how much space is in use or free ??

The design of ZFS is much more wholistic. That is, when we take a step back and look at both the micro and macro we see that we are really trying to solve 3 problems:

* Volume Management
* File System
* Data Integrity

ZFS solves all of these be leveraging knowledge from ALL the layers as one cohesive whole.
https://blogs.oracle.com/bonwick/en_US/entry/rampant_layering_violation

Why RAID is fundamentally broken
https://blogs.oracle.com/bonwick/entry/raid_z

Another interesting doc
http://www.scribd.com/doc/43973847/5/ZFS-Design-Principles

tried it as main laptop filesystem by Luke_22 · 2013-04-26 04:47 · Score: 3, Interesting

I tried btrfs as my main laptop filesystem:

nice features, speed ok, but i happened to unplug by mistake the power supply, without a battery. bad crash... I tried using btrfsck, and other debug tools, even in the "dangerdon'teveruse" git branch, they just segfaulted. at the end my filesystem was unrecoverable, I used btrfs-restore, only to find out that 90% of my files had been truncated to 0... even files i didn't use for months....

now, maybe it was the compress=lzo option, or maybe I played a little too much with the repair tools (possible), but untill btrfs can sustain power drops without problems, and the repair tools at least do not segfault, I won't use it for my main filesystem...

btrfs is supposed to save a consistent state every 30 seconds, so I don't understand how I messed up that bad.... maybe the superblock was gone and the btrfsck --repair borked everything, I don't know.... luckily for me: backups :)

--
"I was gratified to be able to answer promptly, and I did. I said I didn't know." -- Mark Twain

Re:Yawn, yet another filesystem... by UnknownSoldier · 2013-04-26 05:07 · Score: 1

1. One size doesn't fit all though.
Most filesystems aside from ZFS sacrifice correctness for the sake of performance.
* For enterprise correctness is more important then performance.
* For home use performance is more important then correctness.

2. You seem to be ignoring history.
As we've gone from 32-bit to 64-bit CPUs filesystems have likewise gone from 32-bit, 64-bit, and 128-bit.

Remember software (and hardware) is about engineering tradeoffs between 2 extremes:

Correct but Slow < - - - and - - - > Fast but Unstable

--
Only Cowards use Censorship.

Re: Ooooh Flamey by cant_get_a_good_nick · 2013-04-26 05:25 · Score: 1

it seems strange* to develop BTRFS as a GPL file system with ZFS-like features while ZFS is mature and reliable

To be honest, there are many projects that are just this - a rewrite of working code just because the license doesn't match what you want. BDB => GDBM for some reason pops in the mind first. Usually it's mostly a waste of resources as it takes time to build up the feature set of the copied code and avoid the bugs that were revisited because they ignored the design of the copied code. I'm still waiting for my FSF Skype clone.

My guess is that humans want to be architects, not maintainers. It's fun to be bold and create "new" things with the partial safety of it following a known framework than go and try to fix that annoying bug in someone else's code that only shows up on Toshiba hardware with the 2976G chipset and NOT the 2976F chipset and when Obama wears a red tie. This is not of course all of it, there are some legit license reasons for some forks, but underneath methinks this is always a secondary reason.

Re:Yawn, yet another filesystem... by Lennie · 2013-04-26 05:33 · Score: 1

ext4 is usually has better performance, in recent versions of the Linux kernel I believe the ext4-code is used for ext3 and ext2 as well.

--
New things are always on the horizon

Re:Yawn, yet another filesystem... by TheDarkMaster · 2013-04-26 05:36 · Score: 1

The problem is exactly that. We have 15 different file systems but none of them is really complete and free from obvious bugs. It's like having 15 different cars to choose from, but none of them having all the wheels, the engine, transmission, chassis and all of that working properly at the same time.

--
Religion: The greatest weapon of mass destruction of all time

Re:what's taking so long? by Lennie · 2013-04-26 05:40 · Score: 1

To much development on adding features, to little focus on stability.

We'd have more stability if they focused on that instead, but it would take ages to add all the btrfs/zfs-like features which are not in other file systems like RAID. Because things would need to be changed and then stabilized, maybe it would even need a new disk-format.

It might be better to have less stability for a while until most features are part of the code base and then working stabilizing.

--
New things are always on the horizon

Re:Ignorance stated confidently is still ignorance by Tough+Love · 2013-04-26 05:55 · Score: 1

I bet the power never goes out at CERN.

--
When all you have is a hammer, every problem starts to look like a thumb.

Re:It's completely ideological. by Anonymous Coward · 2013-04-26 05:57 · Score: 1

That's absurd. There are several ways of seeing usage of both the BTRFS pool and of individual disks. You can't check individual subvolumes because it's a worthless number when you have a COW filesystem; you'd end up with 4 subvolumes at 3GB each, but you're only using 5GB total for your pool. Since BTRFS "snapshots" have no distinction from subvolumes, it becomes really hard to break this down.

Other than your first (Bad) argument, you make no mention for why ZFS is better than BTRFS. I will agree, ZFS on Solaris is better than BTRFS on Linux .... right now, but I find the design of BTRFS to be superior to ZFS, and when it's feature complete, I will rather use BTRFS. However I'm not an OS zealot, and will use the best tool for the job, instead of making up reasons to stick with my cult ... I mean OS / FS.

Stop right there by Anonymous Coward · 2013-04-26 06:18 · Score: 1

if linux is a toy (your basic argument)

Wrong. I admin linux servers professionally. I am a database developer who makes extensive use of open source software in a production environment.

On the contrary, my "basic argument" was that calling for other people to unite and consolidate is pointless. If they wanted to unite and consolidate, they would have already done so. Naturally, the people working on open source software have already considered everything you said, and they have already said "no".

FYI, I have been a linux geek since '97, and people have been saying the exact same things as you, continually, for the entire 15 years. The reality is that consolidation will happen if and when it makes sense to the people doing the actual work -- not when it makes sense to you.

Re: Ooooh Flamey by wagnerrp · 2013-04-26 06:19 · Score: 1

I'm still waiting for my FSF Skype clone.

Eh? Free SIP clients have been around for much longer than Skype. SIP hardware has been around for much longer than Skype hardware. I have one sitting on my desk right now. Just because Skype became popular, does that mean the entire rest of the industry should abandon their existing IP communications protocol?

Worth trusting your data to btrfs?!? by jafo · 2013-04-26 06:43 · Score: 1

If you are "trusting your data" to *ANY* file-system, you are likely to be disappointed.

I have run btrfs off and on for maybe 3 or 4 years because I don't *HAVE* to trust my data to it. I have good backups that run daily. If btrfs screws the pooch, I'm not really out that much.

Note though, my backup servers run ZFS. :-)

Honestly, it seems to me that btrfs has gotten worse over the last few years rather than better. 4 years or so ago when I first started using it, it actually worked pretty well and I was fairly happy with it, including taking automatic snapshots, but I never had a data loss. ISTR that I switched away from it because I upgraded to a new distro and had to reformat, for various reasons. Newer versions I've tried have been barely usable and I've had brtfs wedge itself a few times. Some of the issues were distro integration issues I think, like 12.04 seemed to *ALWAYS* run a full fsck on boot, and I think it took a snapshot when I tried to do an upgrade to 12.10, which somehow caused it to think that it had space available when it didn't and it ran out of disc space during the upgrade...

I really want btrfs to get production ready, but I'm half thinking that by the time it is HAMMER2 will be out and I'll be infatuated with it. Note that btrfs and HAMMER started around the same time, maybe HAMMER had a 6 month lead. HAMMER has been "production stable" and has been the default Dragonfly BSD filesystem for several years. Dillon seems to know how to build a file-system...

Re:Yawn, yet another filesystem... by anyanka · 2013-04-26 06:51 · Score: 1

I'm guessing parent refers to the trick where you write to a different file then move the data in place when done. Which should be OK, for values of "sanely designed" equal or greater than "won't write metadata until data is safely on disk".

Re:Why? by jafo · 2013-04-26 07:10 · Score: 2

zfsonlinux has less testing than Btrfs? Really?

I think you mean *THE LINUX SHIM* has less testing. However, there's this *HUGE* portion of the code, as a wild ass guess I'd say 80%, which is the internal algorithms, data structures, and other internal parts of the file-system that are shared by the Linux and Solaris versions and those have been quite seriously tested for ZFS.

My experience with ZFS under Linux via FUSE was that there were some bugs in the integration layer, but they tended to be fairly shallow and never lead to data loss. This is over around 3 years of ZFS+FUSE on Linux serious use (~30TB of backup storage, home storage server). I tested the heck out of ZFS+FUSE before we deployed it, found some issues, worked with the developers (who were amazing!), and eventually got to a point where the stress test I was running on it was more stable than it was under our OpenSolaris systems a few years prior (and the reason I built the stress test).

Based on my experience with ZFS, ZFS+FUSE, and btrfs, I'd personally trust ZFSonLinux over btrfs. My experimentation with btrfs the last few years has been that it still needs a lot of work.

Re:It's completely ideological. by UnknownSoldier · 2013-04-26 07:20 · Score: 1

We have these called links. Reading. Try it sometime.

Re:It's completely ideological. by UnknownSoldier · 2013-04-26 07:27 · Score: 1

This is a PDF mirror of the scribd link

ZFS: THE LAST WORD IN FILE SYSTEMS by Bill Moore
http://www.cs.utexas.edu/users/dahlin/Classes/GradOS/papers/zfs_lc_preso.pdf

Re:Yawn, yet another filesystem... by maestroX · 2013-04-26 07:41 · Score: 1

Hi, Is this the reason the system locks up when deleting large numbers of files (near the default max inode for a directory, debian squeeze)? Writing large amounts of data (larger than the files combined) is non-blocking.

Re:It's completely ideological. by UnknownSoldier · 2013-04-26 07:44 · Score: 1

Since you are too dam to lazy to google: brtfs vs zfs ...

http://www.seedsofgenius.net/uncategorized/zfs-vs-btrfs-a-reference

Re: Ooooh Flamey by Agent+ME · 2013-04-26 08:15 · Score: 1

I tried to use a SIP client for Skype-style usage once. Maybe I just had a terrible one recommended to me, but I did not know it was possible to make a messenger app that confusing. Does anyone really expect average Skype users to convert to that?

BARF by jj00 · 2013-04-26 08:29 · Score: 1

All I know is that "Btrfs" makes me think of a drunk trying to say the word "barf" in a sentence. Maybe they should come up with a better name, like "Anchor" (lifted right off the story).

Re: Ooooh Flamey by wagnerrp · 2013-04-26 08:38 · Score: 1

The only meaningful difference is that the Skype application comes pre-configured for their network. SIP softphones have to be manually configured by their users.

Re:replace ext3 and ext4? really? by empath · 2013-04-26 09:36 · Score: 1

4TB SAS drives are available, shipping, and being used.

http://www.newegg.com/Product/Product.aspx?Item=N82E16822178306

--
"Please don't sigh like that, maam"

Reliability is overrated by srussell · 2013-04-26 12:57 · Score: 1

I've lost data with ext3.

I've had data corruptions with reiserfs.

I've lost data with ext4 (which happened to be the most frustrating, tedious, and complete failure of all).

Most recently, I had some HD failures on a fully RAID-1'd server running entirely on XFS, and had to re-install the OS from scratch and restore from backups. The new install was onto btrfs.

I've had partitions running on btrfs for a little over a year, and have not yet lost data on these, but it's just a matter of time; I will lose data. I used to blame it on cheap drives, but I've seen SMART failures on young Seagates so I'm now convinced there's no such thing as a high quality, high density drive. At the moment, I find btrfs easy to use (intuitive and simple), and full-featured, so it's what I'm currently using. But I suffer from no illusions; at some point, I will have FS corruptions and have to restore from backups, and I can only hope that any FS corruptions won't go undetected and be propagated to my backups for very long before that happens. Failures are inevitable no matter what I use, so now I value simplicity, convenience, speed ... and backups.

Right now, btrfs beats the alternatives for convenience and features. I put my trust in backups, not file systems, and value is in features and convenience, not some false perception of safety or reliability.

= We need engineering standards for software by FritzSolms · 2013-04-26 19:03 · Score: 1

This post just emphasises again that we need engineering standards and measures for software.

Re:Sigh by jones_supa · 2013-04-26 20:08 · Score: 1

Well, if linux is a toy (your basic argument)

How the hell can you derive that being his basic argument?

Why hasn't anyone simply ported BeFS? by soren42 · 2013-04-27 04:35 · Score: 1

I know there are only a few diehard holdout BeOS geeks still out there, and I know we have a terrible secret the world has never uncovered: BeFS. This file system, coded and deployed (production) in 1992, is 64-bit, multi-threaded, and fully journaled â" attributes taken for granted today, but only futuristic buzzwords for other OSes of the day. Hard drives deployed on R4, an Intel x86 or PPC OS, were typically 6GB IDE drives. BeFS can handle single files of up to 18,000 petabytes - all of recorded human history at the time was only ~100 petabytes. BeFS is built on an OODB. It's tough, reliable, and well documented (there are even three venerated O'Reilly books on the subject â" two dedicated to *just* the filesystem). It's what zfs and btfs want to be when they grow up. And today, it's discarded. While Linux, OS X, BSD and other OSes could be compiled with kernel support, they aren't. Running it essentially means putting a virtual FS in a file. Tragic â" another example of reinventing the wheel.

--

"Adventure? Excitement? A Jedi craves not these things."

Re:replace ext3 and ext4? really? by jabuzz · 2013-04-27 10:37 · Score: 1

Last time I looked which is admittedly about 18 months ago, even the lastest e2fsprogs did not carry support for ext4 greater than 16TB. Then again would you really trust more than 16TB to a file system that has had support for such a short period of time? If you look into it there performance of ext4 with filesystems that size sucks anyway.

Lets face it ext4 exists for one reason and one reason only - Lustre.

Re:Why? by Dan+Dankleton · 2013-04-27 12:04 · Score: 1

Given a choice of production ready according to (my tests) or production ready according to (my tests Red Hat tests) I'd take the latter every time.

Yes, sooner or later Red Hat WILL miss something. Sooner or later I WILL miss something too. I trust Red Hat to do a breadth of testing which I don't do, and then I do a depth of testing for my specific workload as best as I can model it (and real life has this really annoying habit of finding inventive ways not to conform to my models.)

Re:Yawn, yet another filesystem... by Dan+Dankleton · 2013-04-27 12:14 · Score: 1

1. One size doesn't fit all though. Most filesystems aside from ZFS sacrifice correctness for the sake of performance. * For enterprise correctness is more important then performance. * For home use performance is more important then correctness.

There's another issue mixed in there: only 3 of those systems support clustering - and that's counting OCFS and OCFS2 as different filesystems. So you can add single machine performance vs. distributed performance into the mix. Then there's small file vs. large file performance: if you're only ever storing virtual machine disk images and you could get a 1% I/O boost by using an optimized FS then you'd probably take it. Suddenly the number of filesystems which need supporting starts to look reasonable.

Re:It's completely ideological. by Bengie · 2013-04-30 04:59 · Score: 1

It's hard to calculate free space on a per volume level because unlike ZFS, BTRFS allows for per object RAID levels. You can't really calculate space used without walking the entire tree, which is prohibitively expensive.

Say a user has a volume with a logical size of 1GB, then create a file of 1MB, but then sets the file to be replicated 8 times. Do you report 1023MB free or 1016MB free? Now assume tends of thousands of files with a different configuration. One may be RAID5, one may be RAID6, one may be RAID10.

It's a useful feature.

Re: Ooooh Flamey by Tony+Hoyle · 2013-05-04 07:23 · Score: 1

If you get a client with your network, it comes preconfigured.. so that's not a difference.

There *are* a lot of sucky sip clients. There are also some excellent ones.

Slashdot Mirror

Btrfs Is Getting There, But Not Quite Ready For Production

218 of 268 comments (clear)