OpenZFS Project Launches, Uniting ZFS Developers

← Back to Stories (view on slashdot.org)

OpenZFS Project Launches, Uniting ZFS Developers

Posted by Soulskill on Tuesday September 17, 2013 @12:05PM from the putting-the-band-together dept.

Damek writes "The OpenZFS project launched today, the truly open source successor to the ZFS project. ZFS is an advanced filesystem in active development for over a decade. Recent development has continued in the open, and OpenZFS is the new formal name for this community of developers, users, and companies improving, using, and building on ZFS. Founded by members of the Linux, FreeBSD, Mac OS X, and illumos communities, including Matt Ahrens, one of the two original authors of ZFS, the OpenZFS community brings together over a hundred software developers from these platforms."

297 comments

Min score:

Reason:

Sort:

I'm addicted by MightyYar · 2013-09-17 12:09 · Score: 4, Interesting

I love ZFS, if one can love a file system. Even for home use. It requires a little bit nicer hardware than a typical NAS, but the data integrity is worth it. I'm old enough to have been burned by random disk corruption, flaky disk controllers, and bad cables.

--
W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.
1. Re:I'm addicted by Anonymous Coward · 2013-09-17 12:18 · Score: 5, Funny
  
  I love ZFS too, but I'd fucking kill for and open ReiserFS...
2. Re:I'm addicted by Anonymous Coward · 2013-09-17 12:24 · Score: 0
  
  Too soon ...
3. Re:I'm addicted by Virtucon · 2013-09-17 12:42 · Score: 4, Funny
  
  I think that anything having to do with ReiserFS is a dead end.
  
  --
  Harrison's Postulate - "For every action there is an equal and opposite criticism"
4. Re:I'm addicted by Anonymous Coward · 2013-09-17 12:43 · Score: 0, Troll
  
  That's because you're using Linux, you dumb fucking retard.
5. Re:I'm addicted by TheGoodNamesWereGone · 2013-09-17 12:44 · Score: 2
  
  Well, this *is* SLASHdot (rimshot)
6. Re:I'm addicted by Anonymous Coward · 2013-09-17 12:48 · Score: 1, Informative
  
  Too stupid.
7. Re:I'm addicted by mysidia · 2013-09-17 13:12 · Score: 1
  
  I love ZFS too, but I'd fucking kill for and open ReiserFS...
  I heard that the act of using ReiserFS might be a criminal offense.
  Something about making oneself an accomplice after the fact... I don't know; it's a bit murky
8. Re:I'm addicted by MightyYar · 2013-09-17 13:20 · Score: 1
  
  FreeBSD. I'm sure that makes me more retarded. Or retardeder in your people's language.
  
  --
  W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.
9. Re:I'm addicted by Anonymous Coward · 2013-09-17 13:41 · Score: 0
  
  On the other hand, the project got a captured developer who has lot of time to do a lot of thinking for the next n number of years uninterrupted.
10. Re:I'm addicted by binarylarry · 2013-09-17 14:03 · Score: 0
  
  Once ZFS adds the ability to Kill Your Wife, it will be perfect.
  Also, I am going to hell.
  
  --
  Mod me down, my New Earth Global Warmingist friends!
11. Re:I'm addicted by Anonymous Coward · 2013-09-17 14:36 · Score: 1
  
  On the other hand, the project got a captured developer who has lot of time to do a lot of thinking for the next n number of years uninterrupted.
  Yeah, I'm sure more than any other filesystem it will be devoted to maximum security...
12. Re:I'm addicted by palantir · 2013-09-17 16:14 · Score: 0
  
  Take a look at ZeiserFS then
13. Re:I'm addicted by philip.paradis · 2013-09-17 17:00 · Score: 0, Troll
  
  I spent some time testing various workloads on ScheisseFS, but in the end it was just a shitty solution.
  
  --
  Write failed: Broken pipe
14. Re:I'm addicted by philip.paradis · 2013-09-17 19:19 · Score: 2
  
  I guess nobody got the joke.
  
  --
  Write failed: Broken pipe
15. Re:I'm addicted by The+Last+Gunslinger · 2013-09-17 21:19 · Score: 4, Insightful
  
  I'm sure most readers here "got" it. It just wasn't funny.
16. Re:I'm addicted by drinkypoo · 2013-09-17 23:31 · Score: 3, Funny
  
  OK stop already, you guys are driving this joke into the woods.
  
  --
  "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
17. Re:I'm addicted by Anonymous Coward · 2013-09-17 23:55 · Score: 0
  
  Awww, poor kid. Here, have a penny, go buy yourself a soda my dear, it'll make you feel better.
18. Re:I'm addicted by VortexCortex · 2013-09-18 01:51 · Score: 1
  
  Well, that would imply it's "Not dead yet"...
19. Re:I'm addicted by Sloppy · 2013-09-18 02:36 · Score: 1
  
  No, I heard the project got re-homed and is being actively developed, somewhere in Russia.
  
  --
  As copyright owner of this comment, I authorize everyone to defeat any technological measure which limits access to it.
20. Re:I'm addicted by TheLink · 2013-09-18 03:29 · Score: 2
  
  Nah the real problem is vendor lock-in...
  --
  
  Too many replies beneath your current threshold
21. Re:I'm addicted by tobiasly · 2013-09-18 04:37 · Score: 1
  
  I'm sure most readers here "got" it. It just wasn't funny.
  I'm sure the first person to ever make a pun about ReiserFS was probably pretty funny. Maybe it even had a funny follow-up. At this point they're like that awkward friend you have who isn't quite clever enough to make a funny joke on his own so he just repeats others' jokes and hopes no one has heard them.
22. Re:I'm addicted by philip.paradis · 2013-09-18 20:24 · Score: 1
  
  "ScheisseFS" wasn't a disparaging reference to ReiserFS, which is actually a very solid and capable filesystem. Instead, the joke was the link between the imaginary "ScheisseFS" and the phrase "shitty solution." Apparently, you were too dense to get that, along with whatever mods docked the original comment. Have a great day!
  
  --
  Write failed: Broken pipe
23. Re:I'm addicted by Anonymous Coward · 2013-09-18 23:12 · Score: 0
  
  Explain it for me and the other guy who didn't get it.
24. Re:I'm addicted by Anonymous Coward · 2013-09-18 23:27 · Score: 0
  
  Yes, but at least we're washing the carseats afterwards.
25. Re:I'm addicted by Anonymous Coward · 2013-09-19 00:14 · Score: 0
  
  "Scheisse" means "shit" in German. That's it. That's the entire joke.
26. Re:I'm addicted by Rato+Ruter · 2013-09-19 06:06 · Score: 2
  
  I spent some time testing various workloads on ScheisseFS, but in the end it was just a shitty solution.
  What a crappy wordplay!
27. Re:I'm addicted by philip.paradis · 2013-09-19 16:09 · Score: 1
  
  Since I love awful puns, I think that's funny :).
  
  --
  Write failed: Broken pipe
all i want is BP-rewrite by Anonymous Coward · 2013-09-17 12:13 · Score: 5, Informative

If this gets us BP-rewrite, the holy grail of ZFS i'll be a happy man.
For those who don't know what it is - BP-rewrite is block pointer rewrite, a feature promised for many years now but has never come. It's a lot like cold fusion is that its always X years away from us.
BP-rewrite would allow implementation of the following features
- Defrag
- Shrinking vdevs
- Removing vdevs from pools
- Evacuating data from a vdev (say you wanted to destroy you're old 10 disk vdev and add it back to the pool as a different numbered disk vdev)
1. Re:all i want is BP-rewrite by Bengie · 2013-09-17 13:12 · Score: 1
  
  Re-balance vdevs, ftw! But yeah.. shrinking, defrag, blah blah blah.
2. Re:all i want is BP-rewrite by BitZtream · 2013-09-17 14:28 · Score: 1
  
  And adding those things would make it ... well pretty much perfect.
  Throw in background dedup without dedup tables hogging massive amounts of RAM too!
  
  --
  Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
3. Re:all i want is BP-rewrite by Anonymous Coward · 2013-09-17 14:29 · Score: 0
  
  Does ZFS really not support defrag?
4. Re:all i want is BP-rewrite by hedwards · 2013-09-17 15:23 · Score: 1
  
  Why would ZFS need defrag support? UFS never had defrag support and the only times that ever became a problem was when the disk was running out of room. Which is bad for performance reasons anyways.
5. Re:all i want is BP-rewrite by saleenS281 · 2013-09-17 15:36 · Score: 5, Informative
  
  Because a COW filesystem will become fragmented over time simply by the way it works. As you delete files, you're only free-ing up small segments of contiguous blocks. Over time, this leads to fragmentation because writes are sometimes forced into non-optimal disk placement due to lack of free space. Granted - if you never fill the pool beyond 50%, it won't be a problem. For everyone else, it's a matter of when, not if it will become fragmented.
6. Re:all i want is BP-rewrite by saleenS281 · 2013-09-17 15:38 · Score: 2
  
  This will have little to no effect on the bp-rewrite situation. The only people with the skill and intimate knowledge of ZFS to do the bp-rewrite coding have stated both that it's extremely difficult, and that the companies they work for/with have no interest in implementing the feature/paying them to work on the problem. I haven't heard any of them volunteering their free time to focus on it either. This is more or less a marketing campaign IMO.
7. Re:all i want is BP-rewrite by Anonymous Coward · 2013-09-17 17:44 · Score: 0
  
  Bryan, we know it's you. Quit posting as AC.
8. Re:all i want is BP-rewrite by smash · 2013-09-17 20:36 · Score: 2
  
  So you propose that we kill array performance for a bit to de-fragment? Do you have any idea how long it takes to defragment multiple terabytes of data? On a multi-user multitasking OS access is more random anyhow, so its not like your contiguous files are likely to be read sequentially anyhow.
  No, for a mission critical system that actually has a workload, its probably much easier/better to just maintain free space.
  
  --
  I run: Windows, OS X, Linux, FreeBSD. Just because you have a hammer, doesn't mean everything is a nail.
9. Re:all i want is BP-rewrite by hobarrera · 2013-09-17 20:39 · Score: 1
  
  And, of course, very importantly, the ability to add drives to a RAID-Z array after it has been created.
10. Re:all i want is BP-rewrite by TheRaven64 · 2013-09-17 21:13 · Score: 2
  
  Ideally, in something like ZFS you'd want background defragmentation. When you a file that hadn't been modified for a while into ARC, you'd make a note. When it's about to be flushed unmodified, if there is some spare write capacity you'd write the entire file out contiguously and then update the block pointers to use the new version.
  That said, defragmentation is intrinsically incompatible with deduplication, as it is not possible to have multiple files that all refer to some of same blocks all being contiguous on disk. It's also not a problem if you've got a decent sized L2ARC, as the random reads on the disk are fairly rare.
  
  --
  I am TheRaven on Soylent News
11. Re:all i want is BP-rewrite by drinkypoo · 2013-09-17 23:35 · Score: 1
  
  Ideally, in something like ZFS you'd want background defragmentation.
  It is shocking that in this day and age we still cannot simply launch a defragment task on an arbitrary schedule and have the io scheduler handle keeping it from affecting the other activity on the machine. What fucking year is it?
  
  --
  "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
12. Re:all i want is BP-rewrite by TheRaven64 · 2013-09-18 00:24 · Score: 1
  
  That's not what I mean. The defragmentation should happen as part of the normal read / write process. It should not be a separate task, because there is no gain at all (and a significant penalty) from defragmenting files that you (almost) never access.
  
  --
  I am TheRaven on Soylent News
13. Re:all i want is BP-rewrite by Anonymous Coward · 2013-09-18 00:25 · Score: 0
  
  That to!
14. Re:all i want is BP-rewrite by Bengie · 2013-09-18 00:32 · Score: 1
  
  People who use multi TB/PB setups don't really care about defrag. The benefit it provides is very little and the complexity of implementing defrag in a way that allows for a transactionally atomic operation to update potentially billions of blocks at the same time is quite hard.
  
  ZFS can have up to 18,446,744,073,709,551,616 snapshots, and if you change the location of a block of data, you need to update it's block pointer in all of those snapshots, assuming each one points to it.
  
  It's not a simple problem, but can be done, and the cost of re-writing BP(block pointers) is quite expensive. Since defrag doesn't actually add almost any value, there is little reason to add it.
15. Re:all i want is BP-rewrite by Above · 2013-09-18 01:02 · Score: 3, Insightful
  
  You are correct that the disk will become fragmented, but the implication is fragmentation is a problem and that's simply not true. One of the prime causes of the misunderstanding is that fragmentation in Unix file systems is night and day different than fragmentation in a FAT file system, where most people are used to defragging windows drives. Unix file systems use much better algorithms to control fragmentation, so there is (generally) a lot less on a per file basis. They also automatically defragment, there are cases where when a fragmented file is written to the file system will defragment part of that file and rewrite it.
  The Berkeley FFS was the first to "solve" this problem, reserving 10% of the disk space primarily to avoid fragmentation. Decades of experience show that for all but the most corner of corner cases, that is enough, causing no significant amount of fragmentation, or performance degradation.
  * http://www.eecs.harvard.edu/~keith/research/tr94.html
  * http://www.cs.berkeley.edu/~brewer/cs262/FFS.pdf
  * http://www.cs.rutgers.edu/~pxk/416/notes/12-fs-studies.html
  * http://pages.cs.wisc.edu/~remzi/OSTEP/file-ffs.pdf
  The result is that for most applications fragmentation is a complete non-issue. After 25 years of playing with various file systems I've only seen it be an issue once, on an NNTP server that reached 20% fragmentation. Most user desktops and general purpose servers have under 1% fragmentation at all times. Generally, if you have a fragmentation problem it's because the storage is too full, and you need to add storage anyway (the aforementioned NNTP server was a good example). Adding the storage makes the problem go away.
  Most users of Unix file systems will never need to give fragmentation a second thought.
16. Re:all i want is BP-rewrite by saleenS281 · 2013-09-18 02:26 · Score: 1
  
  How is an I/O scheduler supposed to know if the block it's currently working on is one you're about to request? Unless you've got a very consistent and sequential workload, it's EXTREMELY difficult if not impossible to predict which block you're going to want next.
17. Re:all i want is BP-rewrite by drinkypoo · 2013-09-18 02:34 · Score: 1
  
  How is an I/O scheduler supposed to know if the block it's currently working on is one you're about to request? Unless you've got a very consistent and sequential workload, it's EXTREMELY difficult if not impossible to predict which block you're going to want next.
  It's acceptable to make first seek time a little slower if you can make many seeks faster in the future by reorganizing some blocks. And if the io scheduler were sufficiently good, then it would be able to prioritize the defrag task somewhere down in the basement, where it would only get blocks when the system looks quiet.
  I imagine that this is actually really hard, since otherwise someone would have done it by now. But it still boggles my mind.
  
  --
  "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
18. Re:all i want is BP-rewrite by Anonymous Coward · 2013-09-18 06:33 · Score: 0
  
  because the storage is too full, and you need to add storage anyway
  why?
  I have X amount of data. I do not need 2x amount of drive do I? That is a waste of resources. In my case of 40TB of data quite a decent amount of money too.
  That most linux filesystems do not have native defrag built in is a waste of my money and my time.
  I have had files that end up with 10k fragments. Where the read of the file is 20x what it should be if it was mostly contig. Yet there is plenty of free space on the drive I can not defragment it even using cp. Because the drive has scattered 40k of other files in that area. So I need to tear the whole array down and dd it to fix this. I have had files take 2-3 mins to delete because they were badly fragmented. Then on a similar sized file that is not it takes less than a second.
  Edge cases exist. You even stated it. Why can I not defrag? Instead I get hundreds of hand wavy 'dont worry about it'. When it is dead easy to show when you end up in these edge cases your perf will stink. You are right for many use cases it is not a problem. But when you end up in the edge case it is your own little personal hell that you know could be fixed. But for some reason 'it doesnt matter'.
  Why are we so afraid of defragmenting? The end users are screaming for it. Instead we get lots of 'dont worry about it'.
19. Re:all i want is BP-rewrite by Anonymous Coward · 2013-09-18 12:44 · Score: 0
  
  If you want that kind of functionality, Windows Vista from 2006 called. They said the disk was already defragmented twice this week while you weren't looking.
  
  Linux - what better way to occupy a nice fraction of the better programmers on the planet than to harness their efforts poorly behind anti-Microsoft vitriol that never catches up.
20. Re:all i want is BP-rewrite by smash · 2013-09-18 13:37 · Score: 1
  
  It will always affect IO to an extent. If there is already an atomic IO (for defrag) in progress when you make a request, guess what?
  
  --
  I run: Windows, OS X, Linux, FreeBSD. Just because you have a hammer, doesn't mean everything is a nail.
21. Re:all i want is BP-rewrite by smash · 2013-09-18 13:39 · Score: 1
  
  Agreed. Which is what OS X does if i'm not mistaken. But yeah, in theory if the file is frequently accessed it is in the ARC cache anyhow so...
  
  --
  I run: Windows, OS X, Linux, FreeBSD. Just because you have a hammer, doesn't mean everything is a nail.
22. Re:all i want is BP-rewrite by smash · 2013-09-18 13:42 · Score: 1
  
  Also, i'd be interested to see if there was any win in doing that because due to ZFS being copy on write - virtually every write to a file would result fragmentation. Which means every time you were to save a file, you'd need to read the entire thing in and write it out contiguously (defragmenting it). Which would likely KILL write performance, no? e.g., 20 megabyte file, you change 128k of it, you need to turn that into a 20 meg read and a 20 meg write to write a defragged version to disk.
  
  --
  I run: Windows, OS X, Linux, FreeBSD. Just because you have a hammer, doesn't mean everything is a nail.
23. Re:all i want is BP-rewrite by flargleblarg · 2013-09-18 16:10 · Score: 1
  
  That said, defragmentation is intrinsically incompatible with deduplication, as it is not possible to have multiple files that all refer to some of same blocks all being contiguous on disk.
  Nonsense.
  You don't store the address or location of a block, you store the ID of the block's unique data. When/if you need to know the location of the block, you look it up an ID-to-address table. There's absolutely nothing intrinsically incompatible between defragmentation and deduplication.
24. Re:all i want is BP-rewrite by smash · 2013-09-18 16:26 · Score: 1
  
  And when you want to read the actual data pointed to by the lookup table... where do you think the disk head goes? That's right, a random seek to wherever the original data was written on the disks.
  
  --
  I run: Windows, OS X, Linux, FreeBSD. Just because you have a hammer, doesn't mean everything is a nail.
25. Re:all i want is BP-rewrite by flargleblarg · 2013-09-18 16:43 · Score: 1
  
  ZFS can have up to 18,446,744,073,709,551,616 snapshots, and if you change the location of a block of data, you need to update it's block pointer in all of those snapshots, assuming each one points to it.
  That's an incredibly naive implementation then. It would be far superior to store the ID of the block, rather than the location of the block. Then you look the location up in a table, given the ID. Then blocks can move around as needed, as a background process, without harming any data structures. This is exactly how virtual memory paging tables work. Why can't disks do this?
26. Re:all i want is BP-rewrite by flargleblarg · 2013-09-18 18:05 · Score: 1
  
  No no no no no. No.
  When you want to read the actual data pointed to by the lookup table, the disk head goes to the location where the data is now, not where it was originally. Where it was originally is never stored — and never needs to be. All block IDs are looked up and converted to disk locations on-the-fly at read time, using an in-memory-cache that is mutexed to prevent corruption during background operations. And moves can be done in the background without delaying foreground tasks by copying first and then updating the table pointers.
27. Re:all i want is BP-rewrite by TheRaven64 · 2013-09-18 20:43 · Score: 1
  
  If I have two files, with block IDs ABCDE and EFCGH, how would I place them on disk in such a way that a single sequential read would allow me to read either file? Without dedup, it's easy: ABCDEEFCGH, for example. With partial dedup, you could write them as ABCDEFCGH, eliminating the duplication of the E, but you've still got two copies of C. With full deduplication, if you write the first file contiguously, then you are going to need at least two seeks while reading the second - EF{seek}C{seek}GH.
  
  --
  I am TheRaven on Soylent News
28. Re:all i want is BP-rewrite by Anonymous Coward · 2013-09-18 23:08 · Score: 0
  
  Even for Linux, there's xfs_fsr, which is basically a defragmentation tool for xfs. Under IRIX, it is run daily (if the system is idle) and runs for at most a number of hours and copies around files, starting with those with the largest number of fragments.
  http://linux.die.net/man/8/xfs_fsr
29. Re:all i want is BP-rewrite by Anonymous Coward · 2013-09-19 03:52 · Score: 0
  
  ZFS (specifically the ZPL) is largely fragement-avoiding in the normal case where blocksize=128k (or 1M or more in recent versions, which openzfs will make more common and compatible). Further down the stack fragment is also somewhat avoided by write coalescing and the code that schedules writes based on the occupancy ad density of spacemaps.
  Fragmentation is a real issue in practice for some workloads (some databases that do their own synchronous COW within a POSIX object, and frequent (synchronous) NFS rewriting are big culprits), and unfortunately the workaround tends to have to be done at the POSIX level.
  With bp-rewrite, a background defragmentation at low scheduling priority is fairly straightforward. Additionally, a change in zfs recv would avoid the DMU storing objects that are fragmented on the send side as identically fragmented objects on the receive side, although the write scheduling on the receive side tends to absorb a fair amount of that sort of fragmentation anyway, assuming the destination has plenty of long empty runs in their spacemaps.
  FInally, the read side is armed with lots of read-aheads that mask fragmentation slowdown in large numbers of cases; the worst case is a single linear read of a highly fragmented POSIX file, and more readers or random readers are unlikely to notice a performance difference correlating with on-disk fragmentation.
  There are simply bigger fish to fry when it comes to ZFS performance.
30. Re:all i want is BP-rewrite by hr+raattgift · 2013-09-19 03:59 · Score: 1
  
  "Files" are a concept in the ZPL; the ARC doesn't even deal with DMU objects, it deals with blocks.
  You could write a shim beneath the ZPL that maintains a system like you propose, but maintaining file-based caching info is going to eat into the memory available for the L1 ARC. You probably don't want that, really. The existence of the ARC (L1 and L2) is what masks residual latency involved in fragmentation not absorbed in write coalescing and spacemap scheduling, by enabling a place for read-ahead blocks to be cached.
31. Re:all i want is BP-rewrite by Anonymous Coward · 2013-09-19 06:02 · Score: 0
  
  Jeff Bonwick (the other ZFS creator) said several years ago that he had bp rewrite code working on a quisecent pool. It was scary code with lots of race conditions. So, there is code for bp rewrite, but it needs to be worked on.
32. Re:all i want is BP-rewrite by saleenS281 · 2013-09-19 10:22 · Score: 1
  
  The code that Bonwick was working on in 2009 was when he still worked for snoracle. That code was never released, and is not something the community could ever use. Furthermore, Bonwick likely can't recreate it due to the fact his original code is owned wholly by Oracle and will not be released. Any attempt at doing so would likely result in a nice lawsuit.
  
  I stand by my original statement that the open community has made no headway on it and this "open zfs" project won't change the needle.
33. Re:all i want is BP-rewrite by smash · 2013-09-19 14:16 · Score: 1
  
  If you move a de-duped block to defragment file X, then the same block referenced by file Y is now not in order. If it is in memory cache the whole concept is moot anyhow - whether it is fragmented on disk or not is irrelevant.
  Given that with de-dup, we are dealing with BLOCKS and not complete files, what order do you propose they are sorted in? Or are you talking about just "Defragmenting" by consolidating them all at the start of the disk, which would make more sense?
  
  --
  I run: Windows, OS X, Linux, FreeBSD. Just because you have a hammer, doesn't mean everything is a nail.
34. Re:all i want is BP-rewrite by smash · 2013-09-19 14:19 · Score: 1
  
  Exactly... there is no "order" to de-duped blocks to sort them into for contiguous access.
  
  --
  I run: Windows, OS X, Linux, FreeBSD. Just because you have a hammer, doesn't mean everything is a nail.
35. Re:all i want is BP-rewrite by flargleblarg · 2013-09-19 17:53 · Score: 1
  
  If you move a de-duped block to defragment file X, then the same block referenced by file Y is now not in order.
  That's fine. Seriously, when do you ever have a block in common between two files that are not identical copies of the whole file? Certainly not in the middle of a giant file. By nature, giant files (music, video) are compressed and essentially random. Maybe as the very last block in a file, or in a very small file, in which case out-of-order is a non-issue.
  
  If it is in memory cache the whole concept is moot anyhow - whether it is fragmented on disk or not is irrelevant.
  The hash mapping between block ID and disk location should be kept in memory. The actual data doesn't need to be.
  
  Given that with de-dup, we are dealing with BLOCKS and not complete files, what order do you propose they are sorted in? Or are you talking about just "Defragmenting" by consolidating them all at the start of the disk, which would make more sense?
  First, sorted in order of first known occurrence in a file. Then, consolodated at the start of the disk (if the data is largely static). Defragmenting free space is more important often than defragmenting files.
36. Re:all i want is BP-rewrite by smash · 2013-09-20 18:27 · Score: 1
  
  That's fine. Seriously, when do you ever have a block in common between two files that are not identical copies of the whole file? Certainly not in the middle of a giant file. By nature, giant files (music, video) are compressed and essentially random. Maybe as the very last block in a file, or in a very small file, in which case out-of-order is a non-issue.
  
  Very regularly?
  
  --
  I run: Windows, OS X, Linux, FreeBSD. Just because you have a hammer, doesn't mean everything is a nail.
37. Re:all i want is BP-rewrite by Anonymous Coward · 2013-09-22 09:01 · Score: 0
  
  The end users are screaming for it.
  They're not, you are.
Still CDDL... by volkerdi · 2013-09-17 12:15 · Score: 4, Informative

Oh well. I'd somehow hoped "truly open source" meant BSD license, or LGPL.
1. Re:Still CDDL... by Anonymous Coward · 2013-09-17 12:49 · Score: 0
  
  If you really expected Larry to turn over the crown jewels to the Stallman-Cult, you need to stop smoking crack.
2. Re:Still CDDL... by larry+bagina · 2013-09-17 13:26 · Score: 3, Informative
  
  CDDL is basically LGPL on a per-file basis.
  
  --
  Do you even lift?
  These aren't the 'roids you're looking for.
3. Re:Still CDDL... by volkerdi · 2013-09-17 15:25 · Score: 2
  
  CDDL is basically LGPL on a per-file basis.
  Perhaps the intent of the licenses is similar, but there's more to a license than that. Unfortunately, being licensed under the CDDL causes a lot more license incompatibility restrictions than either the LGPL or BSD license do. If it were under one of those, there'd be hope for seeing it as an included filesystem in the Linux kernel. But since it's under the CDDL, that can't happen.
  The developers are, of course, welcome to use whatever license they like. Just pointing out that the CDDL is *not* basically the LGPL under "per-file" or any other basis.
4. Re:Still CDDL... by Anonymous Coward · 2013-09-17 17:24 · Score: 0
  
  Are you smoking crack? Larry doesn't own ZFS, it's still CDDL.
5. Re:Still CDDL... by Anonymous Coward · 2013-09-17 17:26 · Score: 1
  
  According to the FSF, the CDDL is a "free software" license.
  -- http://www.gnu.org/licenses/license-list.html#CDDL
  So take your "truly open" crap and shove it where the Sun don't shine.
6. Re:Still CDDL... by Anonymous Coward · 2013-09-17 17:29 · Score: 2, Insightful
  
  The GPL is the problem here, not the CDDL.
  It's funny how you cite license incompatibility restrictions, but Linux is the only one having those problems.
  OS X, FreeBSD and others don't seem to be having any problems with the CDDL.
  Gee, I wonder why.
7. Re:Still CDDL... by Daniel_Staal · 2013-09-17 17:30 · Score: 1
  
  Which would require a from-scratch cleanroom rewrite, probably.
  They could probably work on that, but if the current license isn't causing to much trouble, they probably have more important things to work on.
  
  --
  'Sensible' is a curse word.
8. Re:Still CDDL... by Anonymous Coward · 2013-09-17 21:44 · Score: 0
  
  It does cause some troubles for Linux, OpenBSD, maybe others too.
  Rewrite? A big deal, won't happen probably.
  I hope for HAMMER2. Not as good as ZFS yet, but promissing.
9. Re:Still CDDL... by Eunuchswear · 2013-09-18 01:07 · Score: 1
  
  Because the CDDL was specificly written to be incompatible with the GPL.
  
  --
  Watch this Heartland Institute video
10. Re:Still CDDL... by chrish · 2013-09-18 01:34 · Score: 1
  
  Apple may have had a problem with the CDDL; they had a finished (as in, good enough to release in beta format in various developer previews of OS X; I forget which cat though) port of ZFS, and pulled it just before release. Which is a shame.
  
  --
  - chrish
11. Re:Still CDDL... by devman · 2013-09-18 02:32 · Score: 2
  
  In fairness its GPL that has the incompatibility problem not CDDL.
  CDDL is compatible BSD, Apache2, LGPL, etc.
  GPLv2 is incompatbile with CDDL, Apache2, GPLv3, LGPLv3, etc.
  Even if the license were not CDDL, it would have to be released under a license that came with a patent clause, which means GPLv3, LGPLv3, Apache2 or similar all of which are incompatible with GPLv2 which Linux is licensed under.
  CDDL isn't the problem.
12. Re:Still CDDL... by devman · 2013-09-18 02:36 · Score: 1
  
  My post from elsewhere in this thread
  http://hardware.slashdot.org/comments.pl?sid=4226771&cid=44883623
  The gist of it is that CDDL isn't the problem. Even if you changed CDDL to GPLv3, LGPLv3 or Apache2 the license would still be incompatible with GPLv2 (Linux license). GPLv2 is the problem.
13. Re:Still CDDL... by Eunuchswear · 2013-09-18 03:24 · Score: 1
  
  Which came first, CDDL or GPLv2?
  Did not the person who "actually wrote the CDDL" say in public "Mozilla was selected [as the basis for the CDDL] partially because it is GPL incompatible. That was part of the design when they released OpenSolaris."
  Yes, some other licenses are incompatible with GPLv2, put to claim that the incompatibility of the CDDL with GPLv2 is the "fault" of the GPL is absurd.
  
  --
  Watch this Heartland Institute video
14. Re:Still CDDL... by devman · 2013-09-18 03:40 · Score: 1
  
  The fact remains that even if the Open ZFS code base was re-licensed they would still have GPLv2 compatibility issues with any other reasonable licence choice.
15. Re:Still CDDL... by Eunuchswear · 2013-09-18 04:03 · Score: 1
  
  The fact remains that even if the Open ZFS code base was re-licensed they would still have GPLv2 compatibility issues with any other reasonable licence choice.
  Any other reasonable licence choice?
  BSD isn't reasonable now?
  
  --
  Watch this Heartland Institute video
16. Re:Still CDDL... by devman · 2013-09-18 04:56 · Score: 1
  
  BSD doesn't come with appropriate patent clauses.
17. Re:Still CDDL... by unixisc · 2013-09-18 05:40 · Score: 1
  
  If you really expected Larry to turn over the crown jewels to the Stallman-Cult, you need to stop smoking crack.
  Ain't this project actually independent of Larry?
  Agree w/ the GP - I'd have thought this project would at least have gotten the BSD license, or one of the derivative licenses like X.
18. Re:Still CDDL... by Anonymous Coward · 2013-09-18 07:08 · Score: 0
  
  So take your "truly open" crap and shove it where the Sun don't shine.
  The Sun may no longer shine, but the Oracle has still been probing deeply.
I guess we know by Anonymous Coward · 2013-09-17 12:16 · Score: 0

what John Siracusa will be doing this weekend.
1. Re:I guess we know by larry+bagina · 2013-09-17 13:30 · Score: 0
  
  Masturbating to pictures of shaved shemales?
  
  --
  Do you even lift?
  These aren't the 'roids you're looking for.
Patents? by Danathar · 2013-09-17 12:19 · Score: 3, Insightful

Not to rain on anybody's parade,but will the commercial holders of ZFS allow this? Or will they unleash some unholy patent suit to keep it from happening?
1. Re:Patents? by gagol · 2013-09-17 12:26 · Score: 2
  
  Same licence, new name. Its more about uniting dev efforts under one roof.
  
  --
  Tomorrow is another day...
2. Re:Patents? by Anonymous Coward · 2013-09-17 12:39 · Score: 0
  
  Just having a source tree (!!) and a contribution licensing policy would be a nice start, but no signs of those on their wiki.
3. Re:Patents? by utkonos · 2013-09-17 12:49 · Score: 4, Informative
  
  FAQ much? There is no central source repository for OpenZFS. Each supported operating system has it's own repository. The previous also has a link to the source tree for each of the supported projects under the umbrella.
4. Re:Patents? by Anonymous Coward · 2013-09-17 12:50 · Score: 0
  
  https://github.com/zfsonlinux
5. Re:Patents? by Anonymous Coward · 2013-09-17 13:07 · Score: 0
  
  From the FAQ, on a novel kind of "joined forces":
  
  There is no central source repository for OpenZFS. Each supported operating system has it's own repository.
If you're successful, Larry will come a callin' by YesIAmAScript · 2013-09-17 12:26 · Score: 2, Funny

As long as Oracle's patents are valid, can anyone seriously believe this will go anywhere?
His fleet of boats isn't going to pay for itself.

--
http://lkml.org/lkml/2005/8/20/95
1. Re:If you're successful, Larry will come a callin' by Anonymous Coward · 2013-09-17 12:36 · Score: 0
  
  Collecting money from opensource-companys? Daryl McBride will turn in his grave if Larry is even stupid enough to try it...
2. Re:If you're successful, Larry will come a callin' by Virtucon · 2013-09-17 12:46 · Score: 2
  
  You mean that fleet of losing boats? Last time I checked it was 7-1 NZ with first to 9 winning.
  
  --
  Harrison's Postulate - "For every action there is an equal and opposite criticism"
3. Re:If you're successful, Larry will come a callin' by stoploss · 2013-09-17 13:08 · Score: 4, Funny
  
  Collecting money from opensource-companys? Daryl McBride will turn in his grave if Larry is even stupid enough to try it...
  Eh? I don't think that the Mormons bury their living, no matter how ghoulish are the corporations that they helm.
  I'm afraid Daryl McBride will be quite operational when your friends' commits arrive...
4. Re:If you're successful, Larry will come a callin' by Anonymous Coward · 2013-09-17 13:09 · Score: 1
  
  As long as Oracle's patents are valid, can anyone seriously believe this will go anywhere?
  hmm. i thought the cddl covered patents sufficiently? and wasn't that one of the reasons why sun chose to create cddl instead of using bsd or gpl or some other existing license in the first place?
5. Re:If you're successful, Larry will come a callin' by Bengie · 2013-09-17 13:15 · Score: 5, Informative
  
  Oracle released ZFS under a BSD compatible license. Anyone is allowed to do whatever to the opensource code. Going forward, Oracle has not opened an code after v28, which is the last OpenSource version to be compatible with Oracle ZFS.
6. Re:If you're successful, Larry will come a callin' by linuxguy · 2013-09-17 16:55 · Score: 1
  
  Eh? You mean Darl McBride and not Daryl McBride? I usually do not nitpick on small stuff like this but this pig vomit should be remembered by his correct name. We don't want to assign blame for what he did to some other innocent person.
7. Re:If you're successful, Larry will come a callin' by Fallen+Kell · 2013-09-17 18:47 · Score: 1
  
  Not quite as bad as you make it out to be considering Team Oracle started out at -2, and they also lost 3 of their key crew members from that incident. Bringing in that many new people at the last minute destroyed the team training that existed, which was a huge setback. The New Zealand ship though does seem to be faster.
  
  --
  We were all warned a long time ago that MS products sucked, remember the Magic 8 Ball said, "Outlook not so good"
8. Re:If you're successful, Larry will come a callin' by buchner.johannes · 2013-09-17 20:20 · Score: 1
  
  Oracle released ZFS under a BSD compatible license. Anyone is allowed to do whatever to the opensource code.
  GP was talking about patents. If they had released it under (L)GPLv3 or Apache2, users would be safe from patents suits.
  
  --
  NB: The message above might reflect my opinion right now, but not necessarily tomorrow or next year.
9. Re:If you're successful, Larry will come a callin' by TheRaven64 · 2013-09-17 21:16 · Score: 2
  
  It's released under the CDDL, which explicitly grants patent rights. If they had licensed it under GPLv2, then they would have been able to sue people (clause 7 allows them to say 'oh, we've just noticed that we have patents on this. Everyone stop distributing it!') and if they'd released it under Apache2 or GPLv3 then it would still be GPLv2-incompatible, so still wouldn't have been useable in Linux.
  
  --
  I am TheRaven on Soylent News
10. Re:If you're successful, Larry will come a callin' by Paul+Jakma · 2013-09-18 00:36 · Score: 1
  
  Bit of a FUDish comment. This code comes with a licence from Sun^WOracle that grants all the needed patent rights to use and redistribute the code.
  
  --
  I use Friend/Foe + mod-point modifiers as a karma/reputation system.
11. Re:If you're successful, Larry will come a callin' by Virtucon · 2013-09-18 03:12 · Score: 1
  
  Yeah and they were -2 down and losing the 3 crew members because of violating the rules and the dog ate their homework... blah blah...
  Hey, I would love to see America's Cup won by the home team but it goes to show that Larry can't always win no matter
  how much money he throws at something. I can imagine the steam coming out of his ears now.
  Frankly I also don't consider yachting to be a "sport" anyway, they may as well line up a bunch of
  Rolls Royce (pl. Royces, Roycin?) and race to see who can get to a mustard jar on a table and spread it on toast after a 1 mile drag race.
  
  --
  Harrison's Postulate - "For every action there is an equal and opposite criticism"
12. Re:If you're successful, Larry will come a callin' by Anonymous Coward · 2013-09-18 23:44 · Score: 0
  
  I just noticed something about Darl's name. McBride, meaning "son of a married woman", i.e. "I'm not actually a bastard, honest!".
  CAPTCHA: epitaphs - I wonder if they'll write that on his gravestone.
Fantastic news! by Anonymous Coward · 2013-09-17 12:27 · Score: 0

I wish them best of luck. ZFS is the best FS out there.
ZFS for Windows? by Anonymous Coward · 2013-09-17 12:30 · Score: 0

Does this mean we might finally get ZFS for Windows?
1. Re:ZFS for Windows? by Virtucon · 2013-09-17 13:05 · Score: 2
  
  It doesn't have to be POSIX compliant to have it ported to it and it doesn't require somebody to pay for licensing. With the Features of ZFS one could argue that a port to at least Windows Server would be great and it would garnish quite a following from those who've had to put up with the way NTFS views disk volumes and storage. There are applications that run well on Windows, especially on the Server side of things so I wouldn't call it dead quite yet. Besides, with Server 2012 we now have Storage Spaces and ReFS which brings some ZFS features to the table, but it's nowhere as sophisticated ad ZFS. There's already been one attempt but it doesn't appear to be actively maintained and it's read only. Oracle has software for Windows Server that interfaces to the Sun ZFS Storage Server (SAN) that works at the VSS level. It's not exposing a ZFS filesystem to windows either, but ZFS is configurable in the SAN. That's a hefty uplift if you're already in deep with EMC or NetApp.
  
  --
  Harrison's Postulate - "For every action there is an equal and opposite criticism"
2. Re: ZFS for Windows? by Anonymous Coward · 2013-09-17 13:30 · Score: 0
  
  why would you wanna argue with an idiot.
3. Re:ZFS for Windows? by _merlin · 2013-09-17 13:32 · Score: 1
  
  WTF are you smoking. POSIX compatibility is easy to achieve, and you can get it on Windows by installing the optional SFU package. Too bad POSIX says nothing about file system driver interfaces - that's entirely kernel-dependent, and even varies between BSDs.
4. Re:ZFS for Windows? by tlambert · 2013-09-17 14:15 · Score: 5, Informative
  
  It doesn't have to be POSIX compliant to have it ported to it and it doesn't require somebody to pay for licensing. With the Features of ZFS one could argue that a port to at least Windows Server would be great and it would garnish quite a following from those who've had to put up with the way NTFS views disk volumes and storage.
  Windows isn't a very friendly development platform for Open Source, starting with the licensing requirements for tools and distribution restrictions on binaries derived from those tools when using header files containing substantial code, or runtime libraries. Part of this is an intentional legal defense against WINE and CrossOver Office, and part of it is just scale management by limiting the support community requirements to "serious developers".
  In addition, a lot of the installable filesystem and similar code, as well as a lot of the necessary VM internals (memory mapped files and paging/swapping from filesystems) are not adequately explained (i.e. they involve locking text regions with level 0 locks, which require a level 3 lock then a level 0 lock, and to do this to get the offsets on the physical media for the blocks in question. This used to not work on removable media in NT as of 4.0.1; not sure if it's supported yet, but it was the reason you couldn't install it in JAZZ drives or even regular hard drives in removable carriers.
  Having developed a filesystem for Windows95 IFSMgr, and reverse engineered all this crap, and having done it again for NT3.51, I would not look forward to having to repeat the process for Windows 7 or Windows 8, which are the only useful versions to target for by the time the code ends up functional.
  So unless someone wanted to seriously underwrite the effort (read: it's have to be done by Oracle, or by a startup who had a monetization strategy that Microsoft wouldn't preempt, like they did when my team, at a previous employer, ported UFS + Soft Updates to Windows 95, and they announced Longhorn-which-never-happened, and then put together a lawsuit about "deep reverse engineering" which would have precluded using it as a bootable FS... no thanks.
5. Re:ZFS for Windows? by BitZtream · 2013-09-17 14:34 · Score: 2
  
  Windows isn't a very friendly development platform for Open Source, starting with the licensing requirements for tools and distribution restrictions on binaries derived from those tools when using header files containing substantial code, or runtime libraries.
  Well, the tools are free and there isn't a redistribution problem, never has been.
  Now, you could argue that ZFS and Windows won't work unless MS does it because ZFS is the whole disk I/O stack rolled into one, and no driver is going to work with the kernel to allow the ZFS system to work in windows, but thats another story entirely. Theres no way to bypass the disk cache for instance, not in a way ZFS would be compatible with. ZFS must use its own cache, and directly access the raw devices, and provide the filesystem driver all rolled into one ... but spread all across the kernel, in order to get proper performance.
  Could get pretty close with some good hacks though, such as FUSE.
  
  --
  Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
6. Re: ZFS for Windows? by Anonymous Coward · 2013-09-17 15:11 · Score: 0
  
  Playing with people is fun!
  
  -Dogbert
7. Re:ZFS for Windows? by TheRaven64 · 2013-09-17 21:23 · Score: 1
  
  It's been possible to write a ZFS IFS for Windows for a long time, but no one has done it, and as far as I know none of the people in this group are interested in working on Windows.
  
  --
  I am TheRaven on Soylent News
8. Re:ZFS for Windows? by pr0nbot · 2013-09-17 23:25 · Score: 2
  
  (You seem to write well so you'll probably appreciate being reminded it's "garner" not "garnish")
9. Re:ZFS for Windows? by tlambert · 2013-09-17 23:46 · Score: 2
  
  Windows isn't a very friendly development platform for Open Source, starting with the licensing requirements for tools and distribution restrictions on binaries derived from those tools when using header files containing substantial code, or runtime libraries.
  Well, the tools are free and there isn't a redistribution problem, never has been.
  Not according to this document; the runtime components are not redistributable. This is an Anti-WINE license measure:
  http://msdn.microsoft.com/en-us/library/ms235299(v=vs.90).aspx
  
  Now, you could argue that ZFS and Windows won't work unless MS does it because ZFS is the whole disk I/O stack rolled into one, and no driver is going to work with the kernel to allow the ZFS system to work in windows, but thats another story entirely. Theres no way to bypass the disk cache for instance, not in a way ZFS would be compatible with. ZFS must use its own cache, and directly access the raw devices, and provide the filesystem driver all rolled into one ... but spread all across the kernel, in order to get proper performance.
  Could get pretty close with some good hacks though, such as FUSE.
  This is actually reverse-engineerable. FUSE isn't an option, since pages which get memory mapped and dirtied are not propagated up via invalidation events. This is the same problem the Heidemann stacking framework has if you stack FS A on top of FS B, and then expose both of them as visible in the mount hierarchy namespace. You can do some things, but you can't do really complicated things.
10. Re:ZFS for Windows? by 7bit · 2013-09-18 00:37 · Score: 1
  
  How about a version of ZFS for Windows that doesn't need to be usable for a boot drive but can be used just for separate data drives?
  Then Windows could do it's own thing with the boot drive and NTFS and it's cache etc, and ZFS would keep all your other data nice and safe.
  I'd pay for even that much ZFS capability in Windows!
11. Re:ZFS for Windows? by Sloppy · 2013-09-18 02:59 · Score: 1
  
  Windows already has that, using a fancy filesystem API which was implemented by something called "Samba."
  
  --
  As copyright owner of this comment, I authorize everyone to defeat any technological measure which limits access to it.
FINALLY. by djhaskin987 · 2013-09-17 12:31 · Score: 1

It's about time ZFS went open. I feel like the only reason btrfs got any traction was ZFS licensing issues.
1. Re:FINALLY. by aliquis · 2013-09-17 12:43 · Score: 1
  
  Don't cheer too soon.
  It's just as open as before.
  Btrfs got a reason to exist for the same reason, and isn't that quite good to? I don't know whatever it's as stable / safe to use yet but if not now then some day.
2. Re:FINALLY. by Anonymous Coward · 2013-09-17 12:53 · Score: 3, Informative
  
  Been using btrfs for several non-essential file systems. Working great so far, and have even done several successful bedup runs. Has worked great for minimizing disk usage on some Maven repositories with lots of duplicate files between Jenkins and Nexus. Maybe not tested enough for your server that you need to stay up all the time, but great for the home desktop (provided you're sane and are keeping backups, which you should be doing already anyway). The more testing it gets, the sooner it becomes "tested enough" for the needs-to-always-be-available server.
3. Re:FINALLY. by Virtucon · 2013-09-17 13:13 · Score: 3, Informative
  
  licensing or patent issues?
  What you also forget is that Oracle was the leading proponent of BTRFS and yes it had to do with licensing and patents from Sun. Once they acquired Sun that all went out the window. If I were the CEO at Oracle I'd ask "Why two file systems that essentially do the same thing? One's mature and the other, not so much" That's why BTRFS still survives but now with less Oracle support. Wait, is that a bad thing?
  
  --
  Harrison's Postulate - "For every action there is an equal and opposite criticism"
4. Re: FINALLY. by Anonymous Coward · 2013-09-17 13:39 · Score: 1, Insightful
  
  He did say home use you dumb obnoxious cunt
5. Re:FINALLY. by smash · 2013-09-17 20:00 · Score: 1
  
  Please reference significant adoption of btrfs.
  
  --
  I run: Windows, OS X, Linux, FreeBSD. Just because you have a hammer, doesn't mean everything is a nail.
6. Re: FINALLY. by Eunuchswear · 2013-09-18 01:14 · Score: 3, Funny
  
  You don't have a multi-petabyte array with mission criitical data at home?
  
  --
  Watch this Heartland Institute video
7. Re:FINALLY. by unixisc · 2013-09-18 05:51 · Score: 1
  
  Don't cheer too soon.
  It's just as open as before.
  Btrfs got a reason to exist for the same reason, and isn't that quite good to? I don't know whatever it's as stable / safe to use yet but if not now then some day.
  Is BTRFS under GPL2 or 3 or CDDL or something else?
8. Re:FINALLY. by unixisc · 2013-09-18 06:02 · Score: 1
  
  So what does Oracle use, or would like to use, for OEL? BTRFS, or ZFS-on-Linux?
9. Re:FINALLY. by Virtucon · 2013-09-18 06:20 · Score: 1
  
  AFAIK SUSE and Oracle Secure Linux are the only distros that include BTRFS natively. Oracle has a whole list of products that are ZFS based products so I guess you could say that ZFS has won that battle even within Oracle. I guess if you own the patents on ZFS, being a mature technology vs. BTRFS which is still somewhat immature I guess you'd ride the ZFS horse all the way to the bank which is what it appears that Oracle is doing. The patent issue more than anything else will be the key barrier in getting ZFS widely adopted which is sad really because it's an outstanding technology.
  
  --
  Harrison's Postulate - "For every action there is an equal and opposite criticism"
10. Re:FINALLY. by aliquis · 2013-09-18 10:46 · Score: 1
  
  Is BTRFS under GPL2 or 3 or CDDL or something else?
  Both regular ZFS and this OpenZFS seem to be under CDDL so for him nothing changed.
  Afaik btrfs is in the Linux kernel so I assume GPL2.
Advatages of ZFS over BTRFS? by TheGoodNamesWereGone · 2013-09-17 12:43 · Score: 2, Insightful

I'm sure I'll be corrected if I'm wrong, but does it offer any advantage over BTRFS? I'm not trying to start a flame war; I'm honestly asking.
1. Re:Advatages of ZFS over BTRFS? by Anonymous Coward · 2013-09-17 12:52 · Score: 0
  
  ZFS is used in production around the world by fortune 500 companies. BTRFS is the future default-fs of linux, and will probably always be (the future part that is).
2. Re:Advatages of ZFS over BTRFS? by Anonymous Coward · 2013-09-17 12:56 · Score: 1
  
  I'm not sure BTRFS has gotten any of this
3. Re:Advatages of ZFS over BTRFS? by silas_moeckel · 2013-09-17 12:59 · Score: 1
  
  btrfs is still considered experimental by the devs zfs is used in production.
  Past that btrfs does not seem to support any sort of ssd caching wich is realy a requirement for any modern fs.
  
  --
  No sir I dont like it.
4. Re:Advatages of ZFS over BTRFS? by Vesvvi · 2013-09-17 12:59 · Score: 5, Informative
  
  I don't have any practical experience with BTRFS, but I use ZFS heavily at work.
  The advantage of ZFS is that it's tested, and it just works. When I started with our first ZFS testbed, I abused that thing in scary ways trying to get it to fail: hotplugging RAID controller cards, etc. Nothing really scratched it. Over the years I've made additional bad decisions such as upgrading filesystem versions while in a degraded state, missing logs, etc, but nothing has ever caused me to lose data, ever.
  The one negative to ZFS (if you can call it that) is that it makes you aware of inevitable failures (scrubs catch them). I'll lose about 1 or 2 files per year (out of many many terrabytes) just due to lousy luck, unless I store redundant high-level copies of data and/or metadata. Right now I use use stripes over many sets of mirrored drives, but it's not enough when you read or write huge quantities of data. I've ran the numbers and our losses are reasonable, but it's sobering to see the harsh reality that "good enough" efforts just aren't good enough for 100% at scale.
5. Re:Advatages of ZFS over BTRFS? by mysidia · 2013-09-17 13:20 · Score: 2
  
  I'm sure I'll be corrected if I'm wrong, but does it offer any advantage over BTRFS? I'm not trying to start a flame war; I'm honestly asking.
  BTRFS is still highly experimental. I had production ZFS systems back in 2008. A mature ZFS implementation is a lot less likely to lose your data with filesystem code at fault (assuming you choose appropriate hardware and appropriate RAIDZ levels with redundancy).
6. Re:Advatages of ZFS over BTRFS? by Anonymous Coward · 2013-09-17 13:21 · Score: 1
  
  So you never lose data ever... yet you lose one or two files per year...
  Not that your losses are big... but if the file lost was 100GB of weather data... that could be called a BIG loss.
7. Re:Advatages of ZFS over BTRFS? by TheGoodNamesWereGone · 2013-09-17 13:33 · Score: 1
  
  I'm using it right now on an SSD, but then I've turned off the swapfile and 'discard' in FSTAB, with no trouble. I'll admit I was put off initially by its experimental nature. This is the first time I've used it; prior to now I always used ext2, 3, or 4. Thanks to everyone who commented.
8. Re:Advatages of ZFS over BTRFS? by Anonymous Coward · 2013-09-17 13:33 · Score: 0
  
  "nothing has ever caused me to lose data, ever."
  "I'll lose about 1 or 2 files per year"
  ??? Reconcile, please?
9. Re:Advatages of ZFS over BTRFS? by mysidia · 2013-09-17 13:35 · Score: 1
  
  The one negative to ZFS (if you can call it that) is that it makes you aware of inevitable failures (scrubs catch them). I'll lose about 1 or 2 files per year (out of many many terrabytes) just due to lousy luck
  What? Interesting.... I never lost a file on ZFS... ever; and I was doing 12TB arrays, for VMDK storage; these were generally RAIDZ2 with 5 SATA disks, running ~50 VMs. Then in ~2011, concatenated mirrored sets of drives; large number of Ultra320 SCSI spindles in a direct attach SCSI Chasis --- which is the hardware I got, when requesting SAS direct-attach JBOD and SAS HBAs; the SAS hardware wasn't available for less than $1000 on eBay, so management had a "better" idea, even after being persuaded (with great difficulty) that moving forward with 3 Exchange server VMs >1000 users, 10 web servers, spam filters, countless other stuff, on a RAID5/RAIDZ2 array with 6x 2TB SATA drives was not a great idea --- thankfully all that parallel SCSI junk has since been scrapped.
  Which was done on NFS and also on iSCSI. The average virtual disk is approximately 100gb.
  No matter how you put it "losing 1 or 2 files" of 100gb, 200gb, or 1TB in size is a big deal.
  Choosing the storage solution that loses a VMDK file or VMX file; if the wrong VMDK/VMX file..... equates to an entire server disappearing into oblivion: if a mission-critical VM. Can be very harmful to continued employment.....
10. Re:Advatages of ZFS over BTRFS? by mysidia · 2013-09-17 13:38 · Score: 1
  
  Apparently "never lost data" must mean never lost an entire filesystem -- that's not my definition. Usually file loss is user error.
  ZFS does support snapshots, and Nexenta / FreeNAS / etc have snapshot options, and replication options (zfs send | zfs recv) available, for sure.
  It's a highly resilient filesystem, but owning and using a highly resilient filesystem is not a replacement for having the proper backups.
11. Re:Advatages of ZFS over BTRFS? by Anonymous Coward · 2013-09-17 13:45 · Score: 3, Informative
  
  You don't understand. ZFS didn't lose that data -- ZFS detected that the underlying disk drives lost that data. You can run ZFS in a highly redundant modes that allow it to reconstruct lost data, but it sounds like OP's redundancy is such that sufficient drives may lose bytes as to cause lost files.
12. Re:Advatages of ZFS over BTRFS? by Bengie · 2013-09-17 13:45 · Score: 2
  
  "Unexpectedly" lost data. The things he's mentioned would have hosed other Fes' completely, but losing some data because his lack of redundancy is fine.
13. Re:Advatages of ZFS over BTRFS? by Bengie · 2013-09-17 13:47 · Score: 1
  
  It sounds like he disabled/reduced ZFS's default to keep extra copies of meta-data.
14. Re:Advatages of ZFS over BTRFS? by Anonymous Coward · 2013-09-17 13:59 · Score: 0
  
  btrfs advantage = support on multiple architectures (nearly anything that Linux will run on), and small memory/low-end cpu. In-tree, on linux, so likely supported even with a reasonably current distribution kernel.
  zfs advantage = stability, additional features, support on several kernels.
  I've been running btrfs for a few years on non-essential data (upstream mirrors might get annoyed, but any data I store on btrfs can just be re-rsync'd from upstream if a failure). I've had one very bad fs corruption issue in btrfs a couple years ago, but nothing else. I use the snapshot feature extensively, but not much else (considered the offline dedup hack, but that didn't get much love). But, even with only this single failure, I still wouldn't trust it for anything that I mind losing.
  As for ssd caching (mentioned in another reply), I have never done it, but there is a filesystem agnostic ssd caching layer in newer linux kernels (probably too new for any stable distribution kernels; you'll probably have to compile your own). It should work for any fs, but not quite the same thing as l2arc in zfs (more like the local disk cache for NFS that Solaris had all those years ago).
  If you have lots of hardware to throw at it and care about your data, ZFS wins (even on Linux). If you are running a smaller system e.g., a 32bit arm board with 1G +/- ram, and can afford to lose the data, btrfs needs testing.
15. Re:Advatages of ZFS over BTRFS? by Anonymous Coward · 2013-09-17 14:04 · Score: 0
  
  No, he's referring to the features in ZFS that allow you to use an SSD as either a read cache or a write cache. Basically you can have a huge array of spinning metal disks and a couple of SSDs for read and write caching.
16. Re:Advatages of ZFS over BTRFS? by mysidia · 2013-09-17 14:05 · Score: 1
  
  It sounds like he disabled/reduced ZFS's default to keep extra copies of meta-data.
  That would seem to require altering the source code. At least in the Solaris X86 ZFS implementation; there is no zpool or zfs dataset option to turn off metadata redundancy.... of course it would be a bad idea.
17. Re:Advatages of ZFS over BTRFS? by Anonymous Coward · 2013-09-17 14:08 · Score: 1
  
  None. It was built in Larry's house, you'd be a fool to support it. BTRFS is actually no-strings-attached GPL (Yes, being able to re-close the source is a restriction I find unreasonable), and it'll be as good / better than ZFS soon.
  That's the cost of freedom, ZFS may be a bit more shiny now but it's propping up a monster. Let it die.
  "No-strings-attached GPL" is clearly an oxymoron. But besides pointless debates over licensing, we have a bulletproof file system in ZFS now (er, well, we've had it for 5+ years actually). BTRFS may catch up some day, but wake me up when it's actually ready for critical systems deployment.
18. Re:Advatages of ZFS over BTRFS? by Anonymous Coward · 2013-09-17 14:10 · Score: 0
  
  If ZFS is giving you checksum errors, you probably have some bad hardware in there. Drives themselves have their own error correction mechanisms, so that means either you have a bad data path (i.e. a bad or too long SATA or SAS cable) or you have something else wrong like a bad PSU, flaky mobo, bad RAM, etc...
19. Re:Advatages of ZFS over BTRFS? by sl3xd · 2013-09-17 14:10 · Score: 2
  
  BTRFS has a large number of features that are still in the "being implemented", or "planning" stages. In contrast, those features are already present, well tested, and in production for half a decade on ZFS. Many touted "future" features (such as encryption) of BTRFS are documented as "maybe in the future, if the planets are right, we'll implement this. But not anytime soon"
  Comparing the two is like making up an imaginary timeline where ReiserFS 3 was 4-5 years old and in wide deployment while ext2 was being developed, with plans to implement journaling (ie. ext3) and extents (ie. ext4) still in the "TODO" stage.
  My own BTRFS system is appallingly slow compared to running ext4 on the same hardware; in contrast zfsonlinux is amazing.
  
  --
  -- Sometimes you have to turn the lights off in order to see.
20. Re:Advatages of ZFS over BTRFS? by BitZtream · 2013-09-17 14:53 · Score: 2
  
  I corrupted some files by the following:
  This is a home setup, all parts are generic cheapo desktop grade components, except slightly upgraded rocket raid cards in dumb mode for additional sata ports:
  4 HDDs, 2 vdevs that 2 drive mirrors (RAID 1+0 with 4 drives essentially)
  1 drive in a 2 drive mirror fails, no hot spare.
  When inserting a replacement drive for the failed drive, the SATA cable to the remaining drive in the mirror was jiggled and the controller considered it disconnected.
  The pool instantly went offline. When the drive reconnected, and the new drive was added to the mirror, during the resilvering process, 2 files were detected with invalid checksums. There were files that were being written at that moment the VDEV was yanked out from under ZFS.
  Scrub found additional correctable errors and repaired them, but the files it marked as irreparable were clearly irreparable.
  Simply deleting the corrupted files cleared the pool errors after the next scrub. Since I was copying those files anyway when the failure occurred, I just recopied them and nothing was actually lost .
  Of course, I really can't expect anything else to have happened. I'm EXTREMELY grateful that it didn't take the entire pool down, so while there was 'data loss' it performed exactly as I would have hoped it to.
  You can't expect much better than what it did considering an entire vddv (both drives in the mirror) went off line as data was being written to them.
  Redundant metadata can't solve the problem of large amounts of the HDD becoming unreadable, which given enough terabytes is going to happen, and possibly often when you get into big data sets (think LHC size data sets). You can of course, zfs set copies=5 on the pool, or whatever number of copies you want to get additional protection, but then you might as well just put more drives in the same vdev and benefit from increased read speeds. Copies=1 by default, making it entirely possible to lose data.
  
  --
  Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
21. Re:Advatages of ZFS over BTRFS? by deek · 2013-09-17 14:55 · Score: 2
  
  I'm playing around with btrfs at the moment, and I've spotted some inconsistencies in the document you mentioned.
  * Subvolumes can be moved and renamed under btrfs. I do this on a daily basis.
  * btrfs can do read-only snapshots. Mind you, it does have to be specified.
  * As far as I can tell, "df" does work fine with btrfs. The document implies it does not.
  I am still quite new to btrfs, so I'm learning much at the moment. There may be more points that I've missed.
  It seems, though, your document is a bit out of date, and btrfs has improved since then.
22. Re:Advatages of ZFS over BTRFS? by Anonymous Coward · 2013-09-17 15:02 · Score: 0
  
  That.
  We've been running large RaidZ3 arrays replicated across multiple racks, it takes 4 disk failures to kill a vdev, and the exact same thing has to happen to 3 different servers to make recovery impossible, with another cold backup server the thing just won't go down.
  Fuck expensive raid cards, ZFS is the future.
23. Re:Advatages of ZFS over BTRFS? by Anonymous Coward · 2013-09-17 15:08 · Score: 0
  
  The point is you'd lose way more files using other file systems, hell you'd probably lose more just by silent bit flips alone, and no, those $8k raid cards won't protect you from silent file corruption.
  If the parent poster consider those 2 files important, he'd probably just set copies=x so ZFS will fix it automatically, at the cost of some storage space.
24. Re: Advatages of ZFS over BTRFS? by UnknownSoldier · 2013-09-17 15:16 · Score: 1
  
  It is called ZIL - zero insertion log IIRC
25. Re:Advatages of ZFS over BTRFS? by hedwards · 2013-09-17 15:30 · Score: 2
  
  That's never been true, you always had the option of detaching it or outright deleting just one disk, you just had to make sure you did it in a careful manner so as not to delete things you didn't want to delete.
  Also resizing a volume on a disk is a risky operation to engage in. If it's something that you really need to do, the correct way is to back up the data to a separate disk and restore it to a new volume. Resizing volumes is not exactly in keeping with the philosophy that led to ZFS being created.
26. Re:Advatages of ZFS over BTRFS? by hedwards · 2013-09-17 15:32 · Score: 1
  
  It means the files were lost from the filesystem, and he was notified and recovered them from the backups. Which is a hell of a lot more than what other filesystems would do for you. One of the benefits of ZFS is that it makes it a lot easier to monitor for bit rot.
27. Re: Advatages of ZFS over BTRFS? by devman · 2013-09-17 15:34 · Score: 1
  
  ZFS Intent Log. It soaks up sync writes to a fast drive so they can be written out later to the slower drives.
28. Re:Advatages of ZFS over BTRFS? by TopSpin · 2013-09-17 15:35 · Score: 0
  
  and it'll be as good / better than ZFS soon
  No. Sorry.
  There hasn't been a commit to the official BTRFS tree in over two months. There have only been five distinct contributors during the entire third quarter. The second quarter saw only 70 commits.
  That pace is way too slow for a file system with so many 'to be implemented' features. While not dead, at this rate BTRFS will never surpass ZFS in any notable way.
  I'm sincerely sorry about that. Linux contributors just aren't getting it done wrt BTRFS, and that's a crying shame; other operating systems should look on in envy at marvelous Linux file systems.
  And yes, I should be in there plugging away at it. So should you. But we're not.
  That's not Oracle's fault, either. People just don't care enough to put in the effort. We're just here griping about Oracle and the ZFS license issue and poasting about BTRFS being the answer, waiting for someone to do all that brutally hard work.
  We're deluding ourselves.
  
  --
  Lurking at the bottom of the gravity well, getting old
29. Re: Advatages of ZFS over BTRFS? by Anonymous Coward · 2013-09-17 15:38 · Score: 0
  
  ZIL is ZFS Intent Log
30. Re:Advatages of ZFS over BTRFS? by Anonymous Coward · 2013-09-17 15:52 · Score: 0
  
  None of what you just said includes any technical pros/cons.
31. Re:Advatages of ZFS over BTRFS? by mysidia · 2013-09-17 16:00 · Score: 2
  
  You can't expect much better than what it did considering an entire vddv (both drives in the mirror) went off line as data was being written to them.
  I do expect better, because ZFS is supposed to handle this situation, where a volume goes down with in-flight operations; the filesystem by design is supposed to be able to re-Import the pool after system restart and recover cleanly....
  That shouldn't of happened; it sounds like either the hard drive acknowledged a cache FLUSH, before data had been written to disk, the ZIL was broken (or disabled), or indeed a ZFS bug was found.
  But in the absence of evidence that the disk hardware properly obeys the cache flush command semantics; great suspicion should be pointed at it.
  The whole point of the zfs ZIL is to log in-flight writes, before the writes get added to the pool data, so if there is a halt; the in-flight writes are either completed or aborted in an crash-consistent way --- ala filesystem journaling.
  The pool showing 'corrupt' data indicates ZPOOL was remounted but the state wasn't crash consistent....
  [Or perhaps the hard drive data signal line did not have a clean break, and part of a write command's content was damaged in flight]
32. Re:Advatages of ZFS over BTRFS? by batkiwi · 2013-09-17 16:36 · Score: 5, Insightful
  
  Nice FUD there. You picked the btrfs-progs, which are the userspace tools, not the actual btrfs filesystem driver.
  http://git.kernel.org/cgit/linux/kernel/git/josef/btrfs-next.git/log/
33. Re:Advatages of ZFS over BTRFS? by Anonymous Coward · 2013-09-17 16:47 · Score: 1
  
  You're not seeing much work being done because you're NOT looking where you should: you're looking at the btrfs-progs git repository which are tools to manage btrfs... So yes, no that many commit in there...
  The real kernel repo is here:
  Chris Mason's BTRFS tree
  Here's the log of commit targeted at Linus:
  For Linus
  It's probably easier to just track Linus' tree:
  Linus' tree: fs/btrfs
  Look at that: people are working on it...
34. Re:Advatages of ZFS over BTRFS? by Anonymous Coward · 2013-09-17 17:04 · Score: 0
  
  It's good they fixed this.
  the read-onlyness was only a part of that complaint, although read-only snapshots helps, it's sill more unwieldy for the most common usages of them, and inconsistant with what you would expect from snapshots from other filesystems (other then ZFS). Keeping snapshots seperate makes it far more clear to the user what is going on (particularly when using them in a automated snapshotting config).
  At the vary least df still does not work as expected
35. Re:Advatages of ZFS over BTRFS? by Anonymous Coward · 2013-09-17 17:10 · Score: 0
  
  LOL obviously you didn't know shit about ZFS then and you know even less about it now.
36. Re:Advatages of ZFS over BTRFS? by Vesvvi · 2013-09-17 17:30 · Score: 4, Interesting
  
  This is correct.
  It is statistically assured that you will lose some data with anything less than obscene redundancy. I've run the numbers and we've settled on what's acceptable to us: we have offline backups far more frequently than 2 times/year for everything, so dropping about 2 files/year that are completely unrecoverable without backups isn't a big deal.
  These systems are running a moderate number of very large static files, mixed with a very large number of very small files. The small files are SQLite-style records, and we churn through them very rapidly. I don't know exactly why, but it is always these small files that we lose: there is clearly a bias towards things that are written frequently. The analyst in me is quick to point out that implies failures in ZFS itself, beyond just the disks and "bit rot", but the accelerated failure isn't enough to worry about. So our non-failure rate is easily 6-nines or better per year on the live storage system, but it's still a bit uncomfortable to know that some data is going to be gone, despite that.
  With a minimal amount of effort you can get hardware and software which is not longer the biggest threat to your data. I am personally the most likely source of a catastrophic failure: operator error is more likely than an obscure hardware failure. ZFS has allowed me to reduce that operator error (snapshots, piping filesystems, nested datasets with inheritance), and simultaneously it's outperforming other options on both speeds and security. Overall, I'm extremely pleased.
37. Re:Advatages of ZFS over BTRFS? by cheetah · 2013-09-17 17:47 · Score: 1
  
  Doesn't look like he had a ZIL from the description of the hardware. So it's totally understandable that he might have corruption.
38. Re:Advatages of ZFS over BTRFS? by sethmeisterg · 2013-09-17 17:50 · Score: 1
  
  ZFS is gorgeous, whereas BTRFS is, well, usually walking around with a paper bag over its head.
39. Re:Advatages of ZFS over BTRFS? by Vesvvi · 2013-09-17 17:55 · Score: 2
  
  I had an upgrade path similar to yours, starting with RAIDZ and moving the a group of mirrors. I try not to let any pool get too big, so there are maybe 20 drives/pool. It's always the small files that are lost (see post above) I think each server does about 6 PB/year each direction on these highly-accessed files, so I think it's reasonable to drop ~1MB of non-critical files (they basically store notes of data analysis).
  So far I've never had a problem with VM images, but now we're mitigating that by adding redundant but isolated storage servers. I'm sure you could manage this without ZFS snapshots and send/recv, but I wouldn't want to try.
40. Re:Advatages of ZFS over BTRFS? by deek · 2013-09-17 18:12 · Score: 2
  
  Gotcha. So btrfs and df play up only under a raid1 situation. That explains why I didn't notice any problem.
  As for snapshots, I've set up an automated snapshot system using btrfs. Main volume is mounted to /snapshots. One subvolume is created in there, and is then separately mounted to /data . Snapshots are created under the /snapshot directory, while /data is the path used by applications. I've created a nightly script which renames all previous snapshots, and then creates a new snapshot. It all works seamlessly, and it seems pretty easy to understand. I'm unsure what the fuss is, really.
41. Re:Advatages of ZFS over BTRFS? by jhol13 · 2013-09-17 19:13 · Score: 1
  
  ZFS is tested and has beed used in huge amount of different environments with very posive feedback for well over a decade. I do not know any catastrophic failures (though there must be).
  BTRFS requires latest version of Linux kernel and itself to work. I have no clue about testing (removing disks on the fly, etc.) and definitely it is not widely deployed, not yet proven to work (few anecdotes do not count).
  BTRFS seems to be only slightly more robust than it was five years ago - during this time I have lost two hard disks from my ZFS, quadrupled disk space easily, used NFS4 (and CIFS-ro), etc. All with zero data loss.
  Oracle, at least previously, was the biggest contributor to BTRFS. I would't trust them to invest on two filesystems in the long run. I would't trust them to invest on OpenZFS either, but is more mature in the foreseeable future.
  AFAIK the design of both is very solid (btrfs is better in this sense) and I hope btrfs is someday better than zfs IN REAL LIFE. But that will take at least five years (for me to believe it). If that would happen, I might migrate. But because ZFS does everything I need (raidz/2/3, nfs4, cifs, acl, lz4 compression, dedup, ease of use, ...), I might not, after all what would *I* get (to offset the mgration pita)?
42. Re: Advatages of ZFS over BTRFS? by Anonymous Coward · 2013-09-17 19:13 · Score: 0
  
  Unfortunately still true. Once a zfs filesystem is created that's it. No resize support. It's designed for data centres where buying a whole new set of disks to upgrade is feasable.
  A couple of other issues on the Linux version.. it's not optimised for speed - data transfer peaked at 50 MB/s on my tests, which made the whole system feel sluggish. Also KVM can't read/write from it for some reason - you just get an I/O error.
43. Re: Advatages of ZFS over BTRFS? by Cili · 2013-09-17 19:20 · Score: 1
  
  I was able to read at 800 MB/s and write at 600 MB/s to a zfs pool of 10 disks on consumer hardware, on Linux.
44. Re: Advatages of ZFS over BTRFS? by Anonymous Coward · 2013-09-17 19:44 · Score: 0
  
  I've had drives send back bad data (and not report io errors) when they've been unable to read sectors, so it can happen.
45. Re: Advatages of ZFS over BTRFS? by Anonymous Coward · 2013-09-17 19:57 · Score: 0
  
  Err, thats about half of what I'm getting with a 10-drive md-raid6+lvm2+xfs setup using cheap Toshiba 3.5" consumer drives...
46. Re: Advatages of ZFS over BTRFS? by smash · 2013-09-17 20:04 · Score: 1
  
  It can be resized by growing it to larger vdevs Or by adding additional vdevs. Why you would want to SHRINK a zfs filesystem is a big question.
  
  --
  I run: Windows, OS X, Linux, FreeBSD. Just because you have a hammer, doesn't mean everything is a nail.
47. Re:Advatages of ZFS over BTRFS? by smash · 2013-09-17 20:10 · Score: 1
  
  ZFS was built long before oracle took over sun.
  
  --
  I run: Windows, OS X, Linux, FreeBSD. Just because you have a hammer, doesn't mean everything is a nail.
48. Re:Advatages of ZFS over BTRFS? by smash · 2013-09-17 20:12 · Score: 1
  
  zfs runs on linux, freebsd, solaris, mac os x. possibly elsewhere. btfrs runs on linux. try again...
  
  --
  I run: Windows, OS X, Linux, FreeBSD. Just because you have a hammer, doesn't mean everything is a nail.
49. Re: Advatages of ZFS over BTRFS? by TheRaven64 · 2013-09-17 21:34 · Score: 1
  
  Once a zfs filesystem is created that's it. No resize support
  Minor correction: Once a ZFS pool is created, that's it. Filesystems are dynamically sized. You can also add disks to a pool, but not to a RAID set. You can also replace disks in a RAID set with larger ones and have the pool grow to fill them. You can't, however, replace them with smaller ones.
  
  --
  I am TheRaven on Soylent News
50. Re: Advatages of ZFS over BTRFS? by TheRaven64 · 2013-09-17 21:39 · Score: 1
  
  There are two uses for SSDs in a ZFS pool. The first is L2ARC. The ARC (adaptive replacement cache) is a combination of LRU / LFU cache that keeps blocks in memory (and also does some prefetching). With L2ARC, you have a second layer of cache in an SSD. This speeds up reads a lot. Data that is either recently or frequently used will end up in the L2ARC and so these reads will be satisfied from the flash without touching the disk. The bigger the L2ARC, the better, although practically if it's close to your working set size you'll start to see diminishing returns if you make it bigger.
  The second use is as a log device. The ZIL is the ZFS Intents Log, which is effectively a journal. Transaction groups are written there first so that the filesystem is always in a consistent state. It's usually on the same disk as the storage, which means that writes can involve a lot of seeks. With the ZIL in a different drive (SSD or otherwise), you reduce the number of writes required. Because you can generally write to a ZFS pool significantly faster than to a single disk, putting the ZIL on an SSD stops it becoming a bottleneck. The rule of thumb for the size of the log device is that it should be as big as the maximum amount of data that can be written to your pool in 10 seconds. If you can do 100MB/s writes, you want about 1GB of log device.
  
  --
  I am TheRaven on Soylent News
51. Re:Advatages of ZFS over BTRFS? by TheRaven64 · 2013-09-17 21:43 · Score: 1
  
  You have a ZIL unless you explicitly disable it (in which case, expect data loss and only do this on pools you don't care about). ZIL is not the same as a separate log device.
  
  --
  I am TheRaven on Soylent News
52. Re:Advatages of ZFS over BTRFS? by TheRaven64 · 2013-09-17 21:46 · Score: 1
  
  That's laughably wrong. ZFS is distributed under the CDDL, which includes explicit patent grants. BTRFS is distributed under the GPLv2, which doesn't. If, tomorrow, Oracle stood up and said 'we have a patent that covers ZFS and BTRFS' then if it was one owned by Sun, or during a time when they were distributing ZFS under the CDDL, then users of ZFS would already have a license for it. Users of BTRFS, however, would have clause 7 of the GPL kick in and be unable to legally redistribute it.
  
  --
  I am TheRaven on Soylent News
53. Re: Advatages of ZFS over BTRFS? by jotaeleemeese · 2013-09-17 21:56 · Score: 1
  
  Uh?
  You create dataset on top of a zpool. Then you impose quotas in your datasets, which can be resized at will (you want to reduce the dataset? Empty the data from it and change the quota accordingly. Want to enlarge the dataset? Just increase the quota).
  Datasets are what you mount as what we traditionally understand as filesystems, that you can "resize" at your hears content.
  If you are talking about zpools, there are commands to add or remove devices as needed, and the pool can even use a bigger (why would you put an smaller?) device as soon as it is detected, starting the resync automatically.
  
  --
  IANAL but write like a drunk one.
54. Re:Advatages of ZFS over BTRFS? by jotaeleemeese · 2013-09-17 22:51 · Score: 1
  
  In Linux , perhaps not much, although I find the zfs interface (zpool and zfs commands) and design very clear and intuitive.
  Also perfromancewise I really don't know if ZFS can be beaten, at least for certain tasks: taking a snapshot looks like a trivial task.
  In Solaris ZFS is tightly integrated with zones (virtualization) and clusters (resilience).
  It is just amazing all what you can do with all these components working with each other (Linux is not even remotely close).
  
  --
  IANAL but write like a drunk one.
55. Re: Advatages of ZFS over BTRFS? by WuphonsReach · 2013-09-17 23:30 · Score: 2
  
  If you are talking about zpools, there are commands to add or remove devices as needed, and the pool can even use a bigger (why would you put an smaller?) device as soon as it is detected, starting the resync automatically.
  
  Limited number of drive slots + moving to a smaller, but faster platter in one or more of those slots.
  
  --
  Wolde you bothe eate your cake, and have your cake?
56. Re:Advatages of ZFS over BTRFS? by Lennie · 2013-09-17 23:45 · Score: 1
  
  With every release off btrfs they fix more bugs to make it more stable and fix any outstanding issues mentioned in the article.
  For example 3.9 gained that RAID5/RAID6 support.
  Supposedly, btrfs also uses a smarter data structure than ZFS, so in theory btrfs will eventually surpass ZFS.
  
  --
  New things are always on the horizon
57. Re:Advatages of ZFS over BTRFS? by Lennie · 2013-09-17 23:48 · Score: 1
  
  bcache was added in Linux 3.10 (and dm-cache in 3.9), so Linux already has that covered at an other layer.
  
  --
  New things are always on the horizon
58. Re: Advatages of ZFS over BTRFS? by Anonymous Coward · 2013-09-17 23:56 · Score: 0
  
  Because no one does it, no one wants this feature and its not supported so he posts it as a weakness? Sort of like saying ZFS also wont wash his car?
  Also, i think you mean shrink a ZFS pool not filesystem.
59. Re:Advatages of ZFS over BTRFS? by SuricouRaven · 2013-09-18 00:03 · Score: 1
  
  My home server is currently corrupting about one sector in every 100GB on a data drive. Much higher than is normal. Yet still, I wouldn't have easily noticed it - the errors are subtle. A flickering frame in a video file, a program mysteriously crashing.
  The only reason I am aware of this is luck: I was writing a compression program for my own use, and the in-and-out hashes kept differing for no obvious reason. I spent days going through the code and comparing output before I realized it was a hardware issue.
60. Re:Advatages of ZFS over BTRFS? by silas_moeckel · 2013-09-18 00:04 · Score: 1
  
  Cache not whole filesystem. I'm talking about actual heavy use large data sets not your boot drive or where you store your apps for a workstation. The smallest ZFS pool I have is 6 ish usable TB (4x 3TB drives in mirrors), throw in some cheap consumer SSDs for l2arc and ent ones for the zil mirror. Roughly 1k in parts gets you a disk subsystem that kicks the snot out of sub 40k san/nas units.
  
  --
  No sir I dont like it.
61. Re:Advatages of ZFS over BTRFS? by silas_moeckel · 2013-09-18 00:14 · Score: 1
  
  Neither is quite the same, they lack the intelligence as they just understand blocks they can not tell metadata apart outside of guessing based upon how it's accessed. I like how they work compared to ZIL in writes at some point will have to see if it layers well under zfs. Good example is it has no idea nor does it do a great job at storing dedupe metadata and insuring it's on ssd to avoid a lot of the performance hit. Now bcahce and dm-cache do a bang up great job at caching iscsi.
  
  --
  No sir I dont like it.
62. Re: Advatages of ZFS over BTRFS? by Anonymous Coward · 2013-09-18 00:15 · Score: 0
  
  There are several differences between the two systems. One needs to select what is best for them.
  Here's a good one, can you outline the steps needed to add more space to your md-lvm2-xfs setup and approximately how long this would take?
  In zfs one can just zpool add raid(x) disk(n) etc. This adds a new vdev to an existing pool which takes seconds to complete.
  Once this is done any filesystem (for which you can have a near-unlimited number) in the pool immediately has access to the new space.
  Also, it seems you only have one xfs file system on top of this thing?
63. Re:Advatages of ZFS over BTRFS? by Anonymous Coward · 2013-09-18 01:21 · Score: 0
  
  While it may not contain any technical pro's/cons it does reflect the current state/thinking by many. Would you recommend a "beta" file system for your enterprise?
64. Re:Advatages of ZFS over BTRFS? by Lennie · 2013-09-18 01:34 · Score: 1
  
  I know it's not perfect, but that solves the problem until the btrfs developers get around to adding it to btrfs as planned.
  
  --
  New things are always on the horizon
65. Re:Advatages of ZFS over BTRFS? by Lennie · 2013-09-18 01:48 · Score: 1
  
  The prominent btrfs developers that used to work at Oracle don't work at Oracle anymore. Some of the main developers of btrfs work at FusionIO.
  
  --
  New things are always on the horizon
66. Re:Advatages of ZFS over BTRFS? by SuricouRaven · 2013-09-18 04:10 · Score: 1
  
  zfs runs on linux but, due to license issues, isn't available out-of-the-box on most distros. You'll need to go to some lengths to install it, likely involving a kernel recompile.
67. Re:Advatages of ZFS over BTRFS? by johanwanderer · 2013-09-18 06:15 · Score: 1
  
  I recently had a bunch of BTRFS failure on disks with heavy traffic. It's terrifying when you have to reboot any of those servers.
  
  The only saving grace was the cluster was able to tolerate multiple block device failures, so I was able to reformat those disks using ZFS and resync the data.
  
  To sum up, I experimented on an experimental file system, got burnt, and then switched back to ZFS. I didn't lose data, just time and frustrations.
68. Re:Advatages of ZFS over BTRFS? by silas_moeckel · 2013-09-18 06:27 · Score: 1
  
  As the devs say it's not ready for production :)
  
  --
  No sir I dont like it.
69. Re: Advatages of ZFS over BTRFS? by Anonymous Coward · 2013-09-18 07:25 · Score: 0
  
  I call foul. You should be peaking out at around 150MB/s on the outer track on those disks, so with 8 disks plus parity, you should be peaking around 1200MB/s, and that's not including updating filesystem metadata.
70. Re: Advatages of ZFS over BTRFS? by wagnerrp · 2013-09-18 07:34 · Score: 1
  
  In other words, non-commercial use. ZFS is designed for business, not consumers, where you spec out your machine ahead of time for potential future expansion, and when you reach the limit of that expansion, it's simply time to upgrade the machine. No one is going to pay for extensive development time to allow you to do ZFS cheaply.
71. Re: Advatages of ZFS over BTRFS? by Anonymous Coward · 2013-09-18 08:14 · Score: 0
  
  Seagate 7200.14 and Toshiba ACA series are "cheap consumer desktop" category.
  ~200MB/s linear R/W on the outer tracks and ~155MB/s average over the platter.
72. Re: Advatages of ZFS over BTRFS? by Anonymous Coward · 2013-09-18 09:12 · Score: 0
  
  There are several differences between the two systems. One needs to select what is best for them.
  Agreed.
  
  Here's a good one, can you outline the steps needed to add more space to your md-lvm2-xfs setup and approximately how long this would take?
  mdadm add (instant)
  mdadm resize (~12h restriping in background, performance impact about the same as a background scrub)
  pvresize (instant)
  lvresize (instant) + xfs_growfs (instant) or lvcreate + mkfs.xfs
  
  In zfs one can just zpool add raid(x) disk(n) etc. This adds a new vdev to an existing pool which takes seconds to complete.
  ... with random I/O performance of a single disk. ZFS has ZIL and L2ARC because it *really* needs them.
  
  Once this is done any filesystem (for which you can have a near-unlimited number) in the pool immediately has access to the new space.
  Yup. A lot nicer than having to manually resize LVs to allocate space.
  
  Also, it seems you only have one xfs file system on top of this thing?
  Nope. Currently got 3 xfs LVs exported via NFSoRDMA (user home, media dump, "random other stuff") and 3 raw LVs exported via SRP.
  Btw, the 1600MB/s read 1500MB/s write is on /home which lives on the outer cylinders.
  Can you tell zfs "I want this filesystem on that particular area on the spindles"?
73. Re:Advatages of ZFS over BTRFS? by mysidia · 2013-09-18 11:24 · Score: 1
  
  Doesn't look like he had a ZIL from the description of the hardware. So it's totally understandable that he might have corruption.
  By default; if you have no dedicated ZIL hardware log device, the ZIL log lives on the data pool, so the function is there .
  Unless you mess with some low-level parameters not intended to be set by users; as in 'set zfs:zil_disable = 1' in /etc/system or do a echo zil_disable/W0t1 | mdb -kw; then there always has to be a ZIL; and those options are only available for special circumstances (primarily to facilitate performance troubleshooting by a kernel expert -- or to help the storage admin determine if the storage system would benefit from adding a dedicated ZIL device).
74. Re:Advatages of ZFS over BTRFS? by mysidia · 2013-09-18 13:43 · Score: 1
  So far I've never had a problem with VM images, but now we're mitigating that by adding redundant but isolated storage servers. I'm sure you could manage this without ZFS snapshots and send/recv, but I wouldn't want to try.
  A reasonable approach.
  Ultimately I had to move away from ZFS for another reason.... lack of solid clustering, available for free or at a reasonable price. If a server running the Solaris OS crashes due to a hardware failure; all the storage is going to be down.
  We wanted to have active/passive; with the ability to reboot or upgrade the pair of storage devices, with no downtime or loss of I/O for the various devices and servers relying on the storage: in other words, not just Disaster recovery, but High Availability.
  Nexenta offers a commercial solution called HA cluster. To have 12TB of storage; you have to buy 2 Gold licenses at $5000 each, because the company made a policy that You must buy Gold edition, you have to pay for 24x7 support, and you must use Certified hardware, in order to buy the HA cluster plugin: Also You aren't allowed to just design it yourself --- HA cluster will not be sold without professional services (price tag, ~$3,000) .
  So for 12TB of storage, the 2 Gold licenses are $10,000, you need $2500 in extra capacity licenses, $3000 in professional installation/consultation services you are forced to buy, and then the HA cluster plugin is another $6000+.
  That's $21,000 for the first year plus approximately $4k each successive year in software licenses, and the certified hardware choices are pretty limited ---- the cheapest price quote from a reseller was ~$60,000, once you added a reasonable amount of RAM or cache to the hardware.
  Ultimately, NetApp turned out to be cheaper; approximately $20,000 cheaper, and guess what.... it came with support; we know there are thousands of enterprises using it, it has a certain track record.
  If the ZFS based solution wasn't any cheaper in any sense; how could you possibly justify taking such a risky move?
  Furthermore: NetApp filers have some significant useful features, that ZFS does not, which are hard to argue against. Of course ZFS has some nice unique features as well, such as compression suitable for use with almost all workloads; Better instrumentation (DTRACE), Open source base, More granular control of sizes, reservations, Support for larger number of snapshots, Ability to migrate storage system to larger disk drives later, Dynamic striping; arguably better read/write performance, with the ZFS ARC, and you can buy lots of RAM; instead of being stuck with a fixed 16GB..
  But NetApp has
  
  Asynchronous Deduplication --- In other words block-level Dedup, that is suitable for use in production ZFS dedup is synchronous; which limits the possible places it could be used. ZFS dedup is also a disaster -- I tried testing it out once.... the results were catastrophic; 32GB of RAM turned out not to be enough, at least without a read-optimised SSD for L2ARC.
  DataMotion -- Volumes/LUNs exported using FC or iSCSI can be moved between aggregates, or between "pools" if you will; without downtime, or the clients noticing.
  FlexClone -- Can instantly create a point in time copy of an individual file (LUNs, or Volumes too); the copy is not bound to the original like it is in ZFS (you can delete the original without having to "zfs promote" something, or worry about dependant snapshots) --- in ZFS you can only clone a snapshot of an entire filesystem; this does not work out so well mass-cloning things like virtual machines, because you can only have a host attach a limited number of filesystems.
  RAID-DP --- The important thing to know about it, is it's RAID6, without the traditional performance pe
75. Re: Advatages of ZFS over BTRFS? by UnknownSoldier · 2013-09-19 00:27 · Score: 1
  
  Thanks for the correction, for explaining to those not familiar with it, and remaining civil.
76. Re:Advatages of ZFS over BTRFS? by Anonymous Coward · 2013-09-19 04:54 · Score: 0
  
  ZFS doesn't use the linux page cache like a normal linux filesystem. Nothing dangerous there, but you can't trust 'free' if you're running ZFS.
77. Re: Advatages of ZFS over BTRFS? by hr+raattgift · 2013-09-19 05:07 · Score: 1
  
  A bit more detail:
  The ZIL and separated log (slog, "zpool add pool log ") are slightly different.
  All writes of all varities go into the ARC.
  From the ARC, synchronous writes are synchronously written to the slog (if one is available) and are then marked *asynchronous* writes.
  When the txg is closed, all asynchronous writes are pushed out to the storage vdevs of the pool, and the slog is cleared.
  If no slog is available, from the ARC synchronous writes are synchronously written into the ZIL (yes, ZFS intents log), which is automatically maintained at the start of one or more of the pool's storage vdevs, and then the blocks in the ARC are marked as *asynchronous* writes.
  When the txg is closed, all asynchronous writes are written to the storage vdevs of the pool, and the ZIL is marked empty.
  If on pool import the slog or ZIL is NOT empty then the blocks are written (synchronously) to the storage vdevs before the pool is made available for access.
  So, the slog and ZIL are there to make synchronous write calls return quickly and safely. They aren't permanent storage.
  The slog does not really need to be a fast drive, just one in which write latency is low; all writes to the slog are linear, so there should be almost no track-to-track seeking even in a rotating drive. It is vital that that slog does not lie about having committed data to stable storage (stable in the sense of persisting across crashes, powerfailures, etc.).
  The ZIL lives in the drives that make up the pool's storage vdev(s), so there's nothing special about the ZIL.
  The slog (and ZIL) are mostly felt when doing bursts of synchronous writes -- some POSIX operations do this (rm -r, for instance), some database operations do it too, but the case where slogs are most worthwhile is when NFS clients are doing lots of writes to the pool.
  They are not write caches in the traditional sense. Their main use is to return quickly from synchronous write calls, without compromising pool consistency.
78. Re: Advatages of ZFS over BTRFS? by hr+raattgift · 2013-09-19 05:34 · Score: 1
  
  L2ARC is slowly populated from the older ends of the ARC, using a separate thread. Not all blocks aging out of the ARC are guaranteed to get to the L2ARC. This is to avoid slowdowns when the ARC is under pressure, and to allow for the use of cache vdevs on media which are slow to write but fast to read.
  L2ARC vdevs are circular buffers.
  Each L2ARC entry consumes at least 50 bytes from the ARC, and may consume considerably more. Those bytes are released ONLY after the circular buffer overwrites the L2ARC entry. Therefore a large L2ARC competes for ARC space with ordinary blocks, and the larger the L2ARC the more likely it is that the L2ARC contains stale blocks. Those stale blocks continue to use ARC space.
  The L2ARC's big use is to avoid seeks to fetch occasionally-used data. It helps so little with streaming reads that data that was streamed (via prefetch mechanisms, for example) into the ARC are skipped by the thread that populates L2ARC vdevs. Defeating that (forcing the storage of streamed data) typically worsens performance of a pool.
  The ZIL is automatically managed areas in storage vdevs in each pool. One can configure one or more separated log vdevs which will be used instead of the ZIL. When synchronous write calls are made, the data is stored in the slog (if available) or ZIL, and the call returns. No other data is written to the slog or ZIL. The blocks written out remain in the ARC, marked as dirty and in need of an asynchronous flush. When the open transaction group closes, the in-ARC copies of what went to the slog or ZIL is written out with everything else in the txg, and then the slog or ZIL is cleared.
  Neither the slog nor the ZIL is read except during the import process; if they are non-empty at import, the blocks are written out (synchronously) to the pool's storage vdevs and then cleared.
  "With the ZIL in a different drive (SSD or otherwise), you reduce the number of writes required"
  The ZIL is the ZIL; the slog is the slog. If you have a slog you don't write to the ZIL, and whether that changes the number of writes of *synchronous* data is configuration-dependent. If you have mirrored slogs, for example, you are probably writing more than you would if you just used the ZIL.
  In either case the idea is to write out synchonous blocks quickly and with as little writing latency as possible; writes are linear and are to areas at the start of the vdevs.
  "Because you can generally write to a ZFS pool significantly faster than to a single disk"
  Actually, it's the other way around, but hinges on what you mean by "write". The more disks there are in the pool the more labels have to be updated at the finalization of each txg. That's not a large extra amount to push out to the rotating material, but it's done synchronously and will *invariably* result in seeks to the start and end of each component device in each vdev in the pool.
  Additionally, you can configure a single-disk pool, and there are reasons why you might want to do so, even though that is UNSAFE.
  However, system calls return quickly because all writes go into the ARC, and asynchronous write calls can return immediately; synchronous write calls return when the data is committed to the ZIL or slog(s). In the case where there's a slog, write calls practically never initiate actual activity on the device(s) forming the storage vdev(s). Instead, writes will be triggered by timers.
  For many workloads, this makes writes to pools seem very fast, since delayed writing allows for smarter scheduling, as well as coalescing of writes to specific physical blocks.
79. Re:Advatages of ZFS over BTRFS? by bingoUV · 2013-09-19 19:33 · Score: 1
  
  The analyst in me is quick to point out that implies failures in ZFS itself, beyond just the disks and "bit rot", but
  I think it is that your disks are not giving any bit rot, but memory. Frequently written data passes through memory more times, and can get corrupted in case of memory errors uncatchable by ECC.
  Processors overclocked and overvolted to hell also cause data errors, but I don't think that is your case.
  
  --
  Bingo Dictionary - Pragmatist, n. A myopic idealist.
Re:Cool, but.. by Bengie · 2013-09-17 13:17 · Score: 4, Insightful

Everything else is already handled with LVM and software RAID.
You have a great sense of humor, keep it up.
Re:Cool, but.. by smash · 2013-09-17 13:38 · Score: 3, Informative

That. Those who don't understand ZFS are condemned to reinvent it, poorly.

--
I run: Windows, OS X, Linux, FreeBSD. Just because you have a hammer, doesn't mean everything is a nail.
What is ZFS and why should I care? by JDOHERTY · 2013-09-17 13:44 · Score: 0

It would be nice if the posted article or even the OpenZFS project home page provided some sort of summary of the benefits and objectives of this effort.
What's the difference? by Anonymous Coward · 2013-09-17 13:53 · Score: 0

What is the difference between this ZFS and Oracle's ZFS? If I have to patch the kernel either way, why should I choose one or the other?
1. Re:What's the difference? by Bengie · 2013-09-17 14:15 · Score: 1
  
  Oracle stopped supplying code after v28. The open source community decided it's been too long and ZFS in open source needs new features. So they're parting ways with being compatible with Solaris ZFS. Up until now, Open Source ZFS was able to be mounted by Sun and visa versa, but only up to v28.
2. Re:What's the difference? by Marillion · 2013-09-17 14:41 · Score: 1
  
  The feature I'm waiting for is the v30 feature for filesystem encryption. Full disk encryption is the current fad, but selective encryption just seems cleaner. I see no point of encrypting operating system files only to unencrypt them every time you boot.
  
  --
  This is a boring sig
3. Re:What's the difference? by hedwards · 2013-09-17 15:38 · Score: 1
  
  Selective encryption means that you have to be incredibly careful that sensitive data never hits a non-encrypted portion of the disk. So, I'd say that the full disk encryption is the cleaner option.
4. Re:What's the difference? by greg1104 · 2013-09-17 18:06 · Score: 1
  
  Full disk encryption isn't a fad; it's the only way to this job well. Selective encryption makes people relax because it gives a false sense of security, but there are so many holes that you're still quite vulnerable. In some ways it's worst than no encryption at all, because people at least know they have to be careful about their system then.
  The first giant issue is that operating systems and programs like editors will write work in progress data to disk outside of the encrypted area, such as temporary files, swap files, hibernation files, etc. In a selective encryption system, those are going to end up with unencrypted data exposed in there eventually.
  And if the OS loads without supplying a decryption key, it's trivial for anyone who gets physical access to your system for even a short length of time to add a key logger that then captures the key. Even with full disk encryption you're vulnerable to evil maid attacks, but there are ways to make those harder to execute. An unencrypted OS makes the job trivial. Leave a system with selective encryption sitting near someone who knows this area well, go to the bathroom to take a leak, and they'll have your system owned--swiping your selective encryption key the next time you type it and publishing it over the Internet when possible--before you're back.
5. Re:What's the difference? by Bert64 · 2013-09-17 18:52 · Score: 2
  
  Temporary files and swap aren't a problem...
  Swap can and should be stored on a separate partition, and encrypted using a randomly generated key so its completely lost after a reboot.
  On a properly configured system, only a very small number of locations will be writable by the user, typically the user's home directory and a temporary area... The temporary area can be stored in ram/swap since it doesn't matter if its contents are lost and home can be encrypted.
  It's trivial to add a hardware key logger to virtually any system irrespective of how the software is configured, if someone untrusted has had unescorted physical access to the system then the system should be considered compromised anyway. A hardware keylogger is also os independent, doing it on software requires the malicious party to know what os you're using in advance in order to have a compatible keylogger, and also to work around any non standard configuration you might have.
  
  --
  http://spamdecoy.net - free throwaway anonymous email - avoid spam!
6. Re:What's the difference? by greg1104 · 2013-09-17 19:50 · Score: 1
  
  If someone has physical access to a system for long enough, of course any security can be bypassed and the system must be considered tampered. But a fully encrypted system cannot be compromised in only a minute or two of access; one with an unencrypted boot drive certainly can be. And time to exploit impacts how vulnerable you are in very common real world situations.
  A regular full disk encryption candidate is a laptop you leave home with. I will sometimes leave my laptop sitting at a table with someone I trust enough to not to steal it while I walk away for a small number of minutes, like a bathroom break. But that might be a business meeting or a conference, where the other person sure would like to have the data on my laptop if they could install a siphon unobtrusively while I'm away. And that is plenty of time to install a software keylogger on a system without an encrypted operating system. You're not installing a hardware keylogger on my laptop that fast. There are a large number of exploits possible if you have the system for 30 minutes, but you're not pulling off much if someone leaves their laptop somewhere for two minutes before coming back for it.
  Also, hardware keyloggers require a second round of access to the system to retrieve their recording. Software ones can expect they'll eventually get onto the Internet where they can deliver their payload. And there are very few real non-standard configurations out there. Most of the time you're going to find a laptop with Windows XP or 7, or a Mac, and a toolkit for dropping a software keylogger plus backdoor access on any of those is easy enough to bring along.
  I find it pretty funny that you're speculating about all these unlikely advanced techniques here, while blowing off temp files, swap, and hibernation as non-issues. Run "strings" on a whole hard drive sometime if you want a terrifying look at how many sloppy programs leave text data in unexpected places. Your trust in simple strategies to eliminate them all is pretty optimistic.
7. Re:What's the difference? by feld · 2013-09-18 00:22 · Score: 1
  
  ZFS's filesystem encryption is prone to watermarking attacks and a lot of metadata is in plaintext.
8. Re:What's the difference? by Lennie · 2013-09-18 01:51 · Score: 1
  
  Also the people that used to be the main developers of ZFS don't even work at Oracle anymore. So I would expect the most interresting things to happen outside of Oracle.
  Also interesting to know is: Oracle can't take any code from the open source ZFS. The licensing/agreements don't allow that.
  
  --
  New things are always on the horizon
9. Re:What's the difference? by Anonymous Coward · 2013-09-18 02:20 · Score: 0
  
  Temporary files and swap are a minor problem, a larger one is log files. You absolutely must encrypt /var so why waste your time with an uncrypted / and /usr
10. Re:What's the difference? by Bert64 · 2013-09-19 20:14 · Score: 1
  
  Having temporary files in random places is caused by poorly written programs combined with a poorly configured os... Such programs simply should not have the ability to write files wherever they want.
  As for a software keylogger, you would need to shut down the laptop, boot the laptop from your own media or extract the drive, mount the drive and install your malware before rebooting it back and hoping the mark doesn't notice that the system has unexpectedly rebooted and none of his programs are running anymore.
  Depending on the hardware type, this can be considerably more time consuming than applying a hardware keylogger... And there is no reason a hardware keylogger couldn't include a gsm radio or similar device for transmitting its logs.
  If you're talking about someone who is stupid enough to leave their machine logged in and unattended then an encrypted drive wouldn't help at all anyway.
  
  --
  http://spamdecoy.net - free throwaway anonymous email - avoid spam!
aka bcache + any filesystem you want by raymorris · 2013-09-17 14:26 · Score: 3, Informative

Using a small, fast SSD as a cache for large, slow disks can be awesome for some workloads, mostly servers with many concurrent users.
To do that with ANY filesystem, bcache is now part of the mainline kernel . dmcache does the same thing, and there is another one that Facebook uses.
1. Re:aka bcache + any filesystem you want by smash · 2013-09-17 20:05 · Score: 1
  
  It doesn't do the same thing.
  
  --
  I run: Windows, OS X, Linux, FreeBSD. Just because you have a hammer, doesn't mean everything is a nail.
Still no encryption... *sigh* by the_B0fh · 2013-09-17 14:37 · Score: 2

I wish they had encryption... *sigh*
No, I don't want workarounds, I want it to be built in to ZFS like in Solaris 11.
1. Re:Still no encryption... *sigh* by Anonymous Coward · 2013-09-17 17:39 · Score: 0
  
  Then ask Larry to release the source, like he promised.
2. Re:Still no encryption... *sigh* by sethmeisterg · 2013-09-17 17:53 · Score: 1
  
  Cite where he promised he'd release source. I'll wait.
3. Re:Still no encryption... *sigh* by KonoWatakushi · 2013-09-17 21:01 · Score: 1
  
  There are no satisfactory workarounds, and never will be. The crypto needs to be handled within ZFS or it becomes an over complicated and inefficient mess. (As you are probably aware.) Consider a ZFS mirror on top of two disks encrypted by the OS; even though the data is identical, it now needs to be encrypted twice on write, and decrypted twice on scrub. For ditto blocks, multiply the amount of crypto work by another two or three. There are now (at least) two keys to manage and still no fine granularity access. Adding more vdevs to the pool only exacerbates the problem.
  Copy on write transaction oriented filesystems like like ZFS are the natural place for crypto, as constructing a nonce is trivial; simply append the transaction ID, block offset, etc. It couples perfectly with stream ciphers like Salsa20 (or XSalsa20 for the extended 24-byte nonce), and offers the possibility of extremely fast, flexible, and efficient crypto. There is no expensive key setup required and no need to generate ESSIVs. No need to use expensive crypto modes on top of conventional block ciphers, which require multiple encryptions or other expensive operations like GCM. Furthermore, Salsa20/ChaCha is not only highly secure and trustworthy, but extremely fast, simple, and elegant.
  After all of the work of hammering a square peg into a round hole with conventional full disk encryption, performance of Salsa20 in ZFS/HAMMER/btrfs would rival hardware accelerated block device crypto and be useful on a far greater array of hardware. (Typically it should even surpass it, as redundant crypto operations are eliminated.)
  There is ongoing work on ZFS crypto at https://github.com/zfsrogue/zfs-crypto, though I'm not sure how it is progressing. Having zfs-crypto integrated would be very useful, not only for efficiency reasons, but for the simple and flexible key management. While there are alternatives to a number of other features that ZFS offers, none of them come close to offering the flexibility and convenience of ZFS.
4. Re:Still no encryption... *sigh* by the_B0fh · 2013-09-18 03:34 · Score: 1
  
  I hope zfs-crypto becomes usable so that I can move off solaris 11. I refuse to run anything that does not have crypto.
5. Re:Still no encryption... *sigh* by Baki · 2013-09-18 19:22 · Score: 1
  
  I use LUKS on dmraid, which works quite well (and using ext4 or xfs on top of LUKS).
  I've been looking at ZFS encryption since 2007 but it never materialized (except for solaris 11).
  Given the choice between a general OS with LUKS+dmraid or something wonderful like ZFS but having to use an obscure (nowadays) OS, I'd go for the first choice.
6. Re:Still no encryption... *sigh* by the_B0fh · 2013-09-18 23:47 · Score: 1
  
  But I my 7 disk raid-z3 box for home use... Speaking of which, I need to go replace Disk #x one of these days.
Re: Data integrity by MightyYar · 2013-09-17 14:51 · Score: 4, Informative

Not sure what you mean. You certainly can set up a mirrored pair (or triplet or quadruplet), but you can also set up what's referred to as raidz, where it stripes the redundancy across multiple disks. You can configure how much redundancy... 1, 2, or more disks if you like. You can also tell ZFS to keep multiple copies of blocks, and it will spread those copies out among the disks. You can set that policy per sub-volume (file system in zfs-speak), so that if you decide that some of your data deserves more redundancy, you can set up a folder that will keep 2 copies of everything, but leave all the other folders at 1 copy. It's super geeky. I've had it detect (and correct) corruption in a failing disk, detect corruption because of a flaky disk controller that would otherwise pretend to work fine, and detect corruption when a SATA cable came loose. Combined with the ECC RAM in the server, I feel more comfortable about the integrity of my data than I ever have. I've lost family photos before to random drive corruption, so I'm sensitive to this stuff :)

--
W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.
Good by Anonymous Coward · 2013-09-17 14:51 · Score: 0

Hope they'll implement encryption feature ASAP, fuck Oracle for locking in the good stuff from v29 and on.
I am sick and tired of these clueless greedy fucks who kept fighting against the tide at the cost of the community, it's not like they can keep it out of the open for ever, eventually someone will make something better for free, all they did was piss people off.
Re:Cool, but.. by Anonymous Coward · 2013-09-17 15:18 · Score: 0

Geek card please.
Re: Data integrity by saleenS281 · 2013-09-17 15:34 · Score: 4, Informative

One point to be extremely clear on however - when you set copies = 2 on a folder level, it does NOT guarantee those copies end up on different physical spindles. Early on there were many people who lost files because they skipped RAID thinking that copies=X would protect their data. It is NOT meant as a means to protect against hardware failures.
How does ZFS compare to btrfs? by kriston · 2013-09-17 15:57 · Score: 1

How does ZFS compare to btrfs? Several intentionally unnamed and unlinked commentaries on ZFS apparent current omission from Mac OSX refer to btrfs to be the more GPL-compliant alternative to ZFS. I need more information, as I do not think btrfs has the same aggressive checksumming and automatic volume size feature that ZFS does.
Thanks.

--
Kriston
1. Re:How does ZFS compare to btrfs? by dbIII · 2013-09-17 17:07 · Score: 1
  
  ZFS is for multiple disks and btrfs is not necessarily for multiple disks so there are differences, single SSD behaviour being one of them. Where they overlap ZFS shines mostly due to the amount of work that has been put in. However, up until now ZFS has been progressing very differently on different platforms. Having ZFS on a production linux server is currently still a bad idea in terms of performance and portability if that server can be reinstalled as bsd, so on some platforms btrfs may behave better than ZFS.
  The announcement in the summary will probably lead to consistent performance of ZFS across a few platforms within a few months - which IMHO will make a huge difference in the amount of ZFS use.
2. Re:How does ZFS compare to btrfs? by wagnerrp · 2013-09-18 09:13 · Score: 1
  
  ZFS is for multiple disks and btrfs is not necessarily for multiple disks so there are differences
  I'm not sure what you mean by this. ZFS has some features like automatic recovery that only work when you have multiple disks, but things like checksumming, snapshotting, cloning, dynamic datasets, compression, deduplication, and pretty much every funciton in the 'zfs' executable all work just as well on a single disk.
3. Re:How does ZFS compare to btrfs? by grahamperrin · 2013-09-20 05:31 · Score: 1
  
  Three items that may be of interest:
  
  * A short history of btrfs [LWN.net] (2009) (highlights)
  * How ZFS continues to be better than btrfs — Rudd-O.com in English (2012) (highlights)
  * Btrfs & ZFS, the good, the bad, and some differences. | www | grep storage (2013)
Re: Data integrity by Anonymous Coward · 2013-09-17 16:15 · Score: 0

Raid with 1/2/3/etc (distributed) parity sectors is good for protecting against 1/2/3/etc entire drive failures but they offer no protection against 2/3/4/etc [minus one with each full disk failure] corrupt sectors in a row leading to whole group being rendered corrupt. Complete data copies are a horrible waste of space to combat random data corruption considering typical HDD have somewhere around 1 read error per ~12TB read. Even assuming a dying drive with 100 sector errors per GB [0.038% corruption with 4KB sectors], the space needed to be able to completely fix it is a mere 800KB (with Reed-Solomon) instead of needing 1GB (with repetition).
And more importantly.... by Anonymous Coward · 2013-09-17 16:29 · Score: 0

Going and reimplementing it so it can be available under another license loses you the patent protection of the original code.
This is the ingenious and evil usage of 'copyleft' licenses. You don't have to stop redistribution of the original code, or even derivatives, you just have to ensure it's unusable to anybody else under alternative terms (especially if theirs are radically different than yours.)
I'd always wondered when we'd start seeing the licensing arms-race, and the first skirmishes by those wishing ill are already upon us, many with the money to back them up.
Re: Data integrity by Guspaz · 2013-09-17 17:39 · Score: 1

What are the chances of the exact same sector being corrupt on at least three disks in a raidz2 vdev? This doesn't seem like a plausible scenario.
Re: Data integrity by greg1104 · 2013-09-17 17:40 · Score: 4, Interesting

ECC RAM is an important part here, due to how scrubbing works in ZFS. The background disk scrubbing can check every block on the filesystem to see if it still matches its checksum, and it tries to repair issues found too. But if your memory is prone to flipping a bit, that can result in scrubbing actually destroying data that was perfectly fine until then. The worst case impact could even destroy the whole pool like that. It's a controversial issue; the odds of a massive pool failure and associated doom and gloom are seen as overblown by many people too. There's a quick summary of a community opinion survey at ZFS and ECC RAM, but sadly the mailing list links are broken and only lead to Oracle's crap now.
Re: Data integrity by kthreadd · 2013-09-17 18:09 · Score: 3, Informative

That's what you have backups for.
OpenZFS related to ZFS on FreeBSD by Anonymous Coward · 2013-09-17 19:38 · Score: 0

What might change related to ZFS on FreeBSD?
1. Re:OpenZFS related to ZFS on FreeBSD by TheRaven64 · 2013-09-17 21:49 · Score: 1
  
  Nothing. This is the plan that was discussed at BSDCan this year. It's basically a more formal way for the people that were already collaborating to continue to collaborating, but in a way that encourages more people to join in.
  
  --
  I am TheRaven on Soylent News
2. Re:OpenZFS related to ZFS on FreeBSD by epine · 2013-09-17 23:33 · Score: 1
  
  What you seem to mean is that it will change hardly anything from the perspective of the people already supporting or enhancing ZFS.
  One would hope that this nice umbrella will evolve into a single point of access to learn about the major initiatives planned or in progress. Sometimes these things turn into just another layer of non-information.
  Especially if existing developers perceive the addition as not amounting to change.
Well worth reading this ZFS document by Anonymous Coward · 2013-09-17 20:16 · Score: 1

BTW the "ZFS On Disk Specification" document is a very interesting read. It inspires confidence.
Re: Data integrity by Anonymous Coward · 2013-09-17 20:42 · Score: 0

Uhm... rerun failing checksum calculations one or more times. If they stop failing, it was likely a memory issue. In that case, rerun 10 or 20 more times to ensure that it's not a question of the harddisk failing only some times. Problem solved?
Re: Data integrity by TheRaven64 · 2013-09-17 21:06 · Score: 4, Insightful

ZFS doesn't have ECC, but it does checksum each block, so it can detect per-block errors. If you have valuable data, you can set the copies property to some value greater than 1 for that data set and it will ensure that each block is duplicated on the disk so if one fails a checksum then the other will be used to recover. If you have three disks, you can use RAID-Z, which loses you 1/3 of the space (not 1/2) and allows any single-disk failures to be recovered. Running zfs scrub will make it validate all of the data and when any read fails the checksums recover the data from the other two.
The reason it doesn't use ECC is that ECC doesn't mesh well with the failure modes of disks. ECC is used in RAM because when it gets hot, hit by a solar ray, or whatever, it is common for a single bit to flip (in a single direction, which makes the error correction easier). In a disk, you typically have an entire block fail, not a single bit. Modern disks use multiple levels, so the smallest failure that is even theoretically possible might be a single byte (or nibble) in a block. And since the failure isn't biased, you'd need a fairly large amount of space. A better approach would be for the filesystem to generate something like Reed–Solomon code blocks for every n blocks that are written. This would allow single-block errors to be recovered, as long as the other blocks are okay. The down side of this approach is that the error correcting block would need to be rewritten whenever any of the other blocks is modified. this might be relatively easy to add to ZFS, as it uses a CoW structure, so block-overwrites are relatively rare (although erasing a lot of data would require a lot of checksums to be recalculated). This would mean that a single-block write would end up triggering a lot of reads and that would hurt performance. For ZFS, this might actually be easier to implement, as blocks are written out in transaction groups and so including an error correction block at the end might be a fairly simple modification.

--
I am TheRaven on Soylent News
Re: Data integrity by TheRaven64 · 2013-09-17 21:09 · Score: 1

That depends on the reason for the failure. If it's because there's a little bit of dust on the platter, or a manufacturing defect in the substrate, then it's very unlikely. If it's because of a bug in the controller or a poor design on the head manipulation arm, then it's very likely.
This is why the recommendation if you care about reliability more than performance is to use drives from different manufacturers in the array. It's also why it costs a lot more if you buy disks from NetApp than if you buy them directly: they're the same commodity drives, but NetApp tests batches, discards the least reliable ones, and ensures that you don't have two disks from the same production run in the same array. You're still getting the same drives you can buy elsewhere for a fraction of the price, but you're getting more diversity.

--
I am TheRaven on Soylent News
The problem for whom? by jotaeleemeese · 2013-09-17 21:26 · Score: 1, Insightful

You clearly have not been paying attention to the news, have you?
After the leaks of Snowden regarding general malfeasance from security agencies against the encryption standards that we require to communicate safely and securely (like with your bank, just saying) you can't trust any software that you can't build (or know other people more capable can't build) from scratch.
The GPL guarantees that no stupid institution or individual has free reign to corrupt the computational resources you are using.
Other licenses wax lyrical on this, and the consequence is that your precious Apple OS and applications are now tainted, because you have no way to know if they have backdoors or not.
What does this have to do with ZFS you ask?
Well, encryption. ZFS has the capability to encrypt the datasets you are using, but unfortunately its license would not make it suitable for truly secure encryption in the cases where the company or individual implementing it (Oracle, ahem,ahem) chose not to make the source code available.
At that point you have no way to know if backdoors have been added to your implementation of ZFS.
So again, how is GPL, a license that is protecting your security, the problem?

--
IANAL but write like a drunk one.
1. Re:The problem for whom? by Bengie · 2013-09-18 00:40 · Score: 1
  
  I can't follow your logic.
  
  You're saying that OpenSource ZFS is a security risk because the closed source version doesn't show its source?
  
  GPL doesn't guarantee crap. Lots of web services use GPL'd software with custom changes, but they don't need to release that code. Anyway, if you had to choose between two closed source offerings, would you want a custom in-house file system or ZFS where you don't know if they did or didn't make any changes?
  
  Until GPL can force people to use GPL in the first place, people will just not use GPL if they don't like it.
2. Re:The problem for whom? by Anonymous Coward · 2013-09-18 09:10 · Score: 0
  
  Care to explain how exactly is GPL protecting you?
  Don't like closed source, don't use closed source, it's as simple as that. By definition, closed source is neither GPL nor BSD/CDDL/Apache2 etc. licensed, so I don't see the distinction that you apparently do.
Re: Data integrity by MightyYar · 2013-09-17 23:47 · Score: 1

Or just run ECC memory! :)

--
W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.
Re: Data integrity by MightyYar · 2013-09-17 23:57 · Score: 1

If you are hell bent on this, just partition the drive into however many parts you feel is sufficient and run raidz across them. With 5 partitions and a single redundant partition, you would only use up 1/5 of your drive on parity. It's a hack, but I'm not aware of any production-worthy filesystem that can do what you want.

--
W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.
Re: Data integrity by MightyYar · 2013-09-17 23:59 · Score: 1

Thanks for the clarification, my text is misleading. It will spread the blocks out, but randomly - there is no guarantee that the copies will end up on separate disks (unless you are using mirrored vdevs).

--
W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.
bcache does read, write back, and write through by raymorris · 2013-09-18 00:33 · Score: 1

Bcache does read caching and your choice of write through or write back. I believe that's the same thing ads offers. If you know of some difference in the caching, please specify what you are referring to.
Obviously ZFS is a volume manager, a filesystem, a file server AND a caching solution. Bcache does one thing and does it well - caching. Volume management is a separate thing handled by a volume manager such as LVM, though LVM can serve as a front end to bcache, allowing the user to manage both with one set of tools.
1. Re:bcache does read, write back, and write through by bingoUV · 2013-09-19 19:14 · Score: 1
  
  While I agree with bcache doing one thing and doing it well, clubbing of volume management and filesystem has lot of advantages. Removing the famous RAID write-hole, for one.
  Putting cache and encryption in FS doesn't make sense to me.
  
  --
  Bingo Dictionary - Pragmatist, n. A myopic idealist.
Sticking with ReiserFS here by hessian · 2013-09-18 00:44 · Score: 0

I don't know why everyone's freaked out by a silly little murder.

--
Futurist Traditionalism
Re: Data integrity by VortexCortex · 2013-09-18 02:07 · Score: 1

If you have valuable data, you can set the copies property to some value greater than 1 for that data set and it will ensure that each block is duplicated on the disk so if one fails a checksum then the other will be used to recover.
The sector numbers have long been decoupled from the physical location on disk of the data, for good reason, see: Swapping in a spare sector for one that is going bad.
I'm wondering how anything but a device driver ensures that writing the same exact block of 1's and 0's to a device doesn't result in the same exact hash, and is thus NOT automatically de-duplicated by the hardware and stored in a single location?
In other words: It a post physical addressing world (and post NSA world) you should use an encrypted file system. One can almost create whole drive encryption using ZFS, but it's a bit of a kludge using wrappers and what-not. Further, ZFS supports compression............. De-duplication. Ensure this option is off. Otherwise, if you want to ensure duplicate sectors, do it at the drive level with RAID.
Re: Data integrity by StoneyMahoney · 2013-09-18 03:06 · Score: 1

I thought dealing with single-bit RAM failures was a little more complicated than that?
As I understand it, a failure is caused by a change of voltage in a stored bit. If the voltage change places the stored value between the 0 and 1 thresholds, the state becomes fuzzy. The failed bit can then easily be detected and it's original state calculated using the ECC data. However, if the change in voltage is enough to produce a valid-looking bit flip, the ECC data can detect there has been an error in the block but not which bit has been changed.
Why would a RAM failure be in any particular direction?
Re: Data integrity by Anonymous Coward · 2013-09-18 03:07 · Score: 0

there is nothing controversial. they say very early on, garbage in, garbage out and ECC is a must if you value your data. the only thing controversial is why ECC isnt standard in ALL computers.
you're risking corruption too with any other filesystem and bad memory during ANY operation which can lead to a write to the file system.
ZFS at least guarantees your data integrity
Wait, I'm confused. by idontgno · 2013-09-18 03:16 · Score: 1

OpenZFS' creation as an organization was announced today.
OpenZFS, the software stack, has been part of FreeBSD (9.2, since July) and FreeNAS (9.1, since August).
Does the open non-Oracle filesystem stack predate the organization?

--
Welcome to the Panopticon. Used to be a prison, now it's your home.
Re: Data integrity by Anonymous Coward · 2013-09-18 03:24 · Score: 0

I've had 2 drives die on me at once on me before in a RAID and all it takes then is a single read error during rebuilding to cause uncorrectable errors
Re: Data integrity by LoRdTAW · 2013-09-18 04:04 · Score: 1

I have been looking into ZFS as a replacement to my Linux + mdadm server (backups scripted to AWS account). May I ask what your current setup is OS and hardware wise?
Re: Data integrity by MightyYar · 2013-09-18 04:37 · Score: 1

Hardware:
- Old HP Core 2 Duo workstation from eBay with 4GB ECC RAM
- Extra SATA controller, both for performance and to give me a 5th plug for when I'm replacing drives.
- A 5-drive caddy that replaces the old drive cage.
- 2x 2GB drives and 2x 3GB drives
Software:
- FreeBSD 8.x
Configuration:
- Boot from USB2 Thumb Drive (which I periodically clone to a second, identical thumb drive for instant recovery)
- Drives are mirrored in pairs, for a total capacity of 5GB
I put this together a couple of years ago. If I did it today, I might make some different choices.
For instance, the HP Microserver sells new for about what I paid for the workstation used. It supports ECC RAM and apparently runs FreeBSD well. I would probably choose that, as it would be a cleaner build. This was not available when I put my machine together.
I might consider the Linux version of ZFS, but probably not, since keeping the kernel up to date would be a pain. I would also consider FreeNAS. I tried it back when I decided to use FreeBSD instead, and it was not ready for prime time. It seems to have improved a whole lot, and makes setup and maintenance easier. Not that FreeBSD is hard to use, but it is different from Linux and so you need to learn the new set of tools (like the ports system). I would go with the 9.x branch if I built today - 8.x was the stable branch when I did my build, and FreeBSD is really good about supporting the older branch.
I started with a motley collection of dissimilar drives, which is why I went with pairs. I would be able to get more usable space from the drives by running raidz instead of mirrored pairs - but only if you upgrade drives all at once. My setup lets me replace them in pairs. If I was buying everything brand new, I would probably choose raidz with 2 redundant drives... but I'm on a budget!
Finally, if you use WD Green drives, they work fine but make sure you disable the stupid head parking feature! One of mine beat itself to an early death by parking its heads every 10 seconds for about a year :) There used to be 4096 sector size issues with some drives, but I think they have been sorted out.

--
W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.
Re: Data integrity by LoRdTAW · 2013-09-18 04:55 · Score: 1

Thank you for the very informative response. I was thinking of FreeNAS for the sake of simplicity but I haven't looked into it enough to see if I can manually tweak the ZFS settings. I started my *nix tinkering on a pentium 166 running FreeBSD 4 then moved on to Linux once I discovered Debian around 2002. Haven't used BSD since but I am sure I can dive back into it.
Re: Data integrity by Anonymous Coward · 2013-09-18 05:15 · Score: 0

One problem is the cost of ECC machines. Intel only offers it on their more expensive processors (despite it being cheap to implement). I haven't seen very many ARM chips with ECC either. AMD has it on all of their processors, but AMD isn't really very competitive with Intel.
Re: Data integrity by MightyYar · 2013-09-18 05:22 · Score: 1

Back when I set this up, I spent a couple of weeks playing with various solutions in VirtualBox. It is especially easy to play with ZFS, since you can "yank" and add drives so easily inside VirtualBox. You can even simulate corruption by writing to the disk images. FreeNAS was very tempting at the time, but still had some things I couldn't work around. They seem to have put a lot of work into it since then.

--
W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.
Re: Data integrity by Guspaz · 2013-09-18 08:58 · Score: 1

Which is why raidz3 might be a better option for people with super critical data than raidz2 :) I do get your point, though. It was for that reason that I migrated from raidz to raidz2; I didn't like the idea that if a single disk failed, I had no protection from read errors during a rebuild.
It's also worth commenting on what a few other people have said, which is that copies=2 doesn't put the data on different disks (it's not the same as mirroring). This is partially true; ZFS tries hard to get the data onto different disks and different vdevs, it's just not guaranteed (it depends on available space).
Re: Data integrity by Anonymous Coward · 2013-09-18 11:58 · Score: 0

Err, what? there's $60 Pentium Gs and even Atoms with ECC support...
Re: Data integrity by TheRaven64 · 2013-09-18 20:51 · Score: 1

My memory is slightly fuzzy, since it's over ten years since I studied this, but here goes:
Single-bit errors in DRAM are caused by the capacitor that stores the data being discharged. This means that the transitions happen in one direction: from charged to discharged. With parity RAM, you can tell that an error has occurred, but you can't tell what the error is. The parity and ECC checks happen in the the digital circuitry and so have no knowledge of the analogue state. Since ECC uses Hamming codes, it can detect more than single-bit errors, but it can only fix one bit flip (the bias isn't actually required, but it does make the code shorter).

--
I am TheRaven on Soylent News
Re: Data integrity by Anonymous Coward · 2013-09-18 23:19 · Score: 0

Copies=X does offer protection against some kinds of failures, particulalry bad blocks. The other copies ARE guaranteed to be on different blocks. And the algorithm WILL put copies on different drives if it finds drives with free blocks.
In a raidz setup or a stripe where all the disks were added at the same time (eg at pool creation) and are of the same size, then typically the drives will have the same amount of free blocks.
Re: Data integrity by bingoUV · 2013-09-18 23:28 · Score: 1

I would be able to get more usable space from the drives by running raidz instead of mirrored pairs - but only if you upgrade drives all at once
I thought the "zpool replace" is for that purpose? I am planning a ZFS setup, but all I can afford is a motley collection of dissimilar drives.
Can't I replace one disk at a time when upgrading?

--
Bingo Dictionary - Pragmatist, n. A myopic idealist.
Re: Data integrity by MightyYar · 2013-09-19 01:08 · Score: 1

You can replace a disk any time, but the pool won't use the entire capacity of the disk if it is larger than the others.
I chose a mirror because it wasted the least amount of space:
mirror1 - 500GB drive and 750GB drive
mirror2 - 2x 2TB drives
So my mirror gives me 2TB + 500GB = 2.5TB with 2-drive redundancy. 250GB of my 750GB drive was not used. Later I swapped both smaller drives for 3TB drives when they went on sale Black Friday for $89. When zfs saw the new space, it increased the pool size to 5GB. So now my array doesn't waste any space with 2-drive redundancy.
Had I chosen to do a zraid originally, I would have wasted a lot of space because each drive would be limited to the smallest individual drive's space. So with 1 drive redundancy I would have had 3x500GB = 1.5TB or with 2-drive redundancy I would have had 2x500GB = 1 TB. That obviously wasn't going to be my strategy :) If I switched to zraid with 1-drive redundancy today, I would get 3x2TB = 6TB. 2-drive redundancy would get me 4TB. My mirrors give me sort-of 2 drive redundancy. Obviously, it depends which drives :) Since this is mostly a backup server with no unique data on it except for TV media center storage, I've judged this an acceptable risk :)

--
W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.
Re: Data integrity by Anonymous Coward · 2013-09-19 05:59 · Score: 0

ZFS is a filesystem, so it can not have ECC. RAM memory sticks has ECC.
ZFS has checksums, which is better than ECC, because most ECC RAM can only detect and correct a single bit flip, and detect double bit flips. But ZFS uses SHA256 or fletcher checksums, which can detect and correct much more than single or double bit flips.
Re: Data integrity by saleenS281 · 2013-09-19 10:29 · Score: 1

In a raid-z setup you can rebuild the block from parity, and if you've got raid the point is moot. I would not consider a bad block a hardware failure, that's a hardware error - bad blocks are expected behavior out of spinning rust. The allocator makes no guarantees about copies being on different spindles - it simply prefers to split blocks across vdevs if possible (but there are no guarantees it will do so, and there are many instances where it won't). If you've got multiple partitions on a single disk, it will not delineate those vdevs from vdevs on another physical drive. Furthermore if you have an entire drive die and don't have raid, your pool is going to be toast anyways, regardless of your copies setting. What matters is: COPIES IS NOT A REPLACEMENT FOR RAID. That's the gist of the matter and the point I was making (and thought I was rather clear about).
Re: Data integrity by Anonymous Coward · 2013-09-19 12:18 · Score: 0

That's an utterly idiotic means of getting error correction plus it's still worthless against 2 sequential blocks being damaged. Raid5 isn't at all a suited for data recovery on a single drive and the read/write performance would be beyond shit (a RAID4 without striping would be a slightly less bad idea).
Writing a small script that periodically runs a parchive derivative on recently modified files/directories is a far better (in space required and robustness) "hack" but a file systems implementing it would still be far superior.
Re: Data integrity by MightyYar · 2013-09-19 13:54 · Score: 1

That's an utterly idiotic means of getting error correction

Agreed.

plus it's still worthless against 2 sequential blocks being damaged
Why? You could wipe out an entire partition and still have data integrity.

Raid5 isn't at all a suited for data recovery on a single drive

Agreed. Though to be pedantic I was suggesting raidz, not RAID5.

a RAID4 without striping would be a slightly less bad idea
I don't think zfs offers anything like RAID4.

Writing a small script that periodically runs a parchive derivative on recently modified files/directories is a far better

I was thinking something like taking snapshots and then running parchive against the snapshot? I haven't put much thought into this - drives are cheap.

but a file systems implementing it would still be far superior.
Yes, but it is kind of an edge case... yeah, you sometimes get some SMART warning about a failing drive, but who wants to image a failing drive and then apply parity tools? At that point you might as well just put in a new drive and restore from backup.
I was proposing this "solution" facetiously - no sane person would live with such a setup.

--
W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.
Re: Data integrity by bingoUV · 2013-09-19 18:03 · Score: 1

I was thinking of "upgrading" btrfs to zfs, but looks like it will be a downgrade given my variety of disks. Btrfs, while not supporting RAID 5 , handles variety of disks very efficiently. IIRC there is already the problem with zfs that you can't shrink a dataset.
Thanks, I need to do more research.

--
Bingo Dictionary - Pragmatist, n. A myopic idealist.
Re: Data integrity by TheRaven64 · 2013-09-19 20:31 · Score: 1

SHA256 is not an error correcting code. It can not correct even single-bit flips. If it could, it would be useless as a cryptographic hash. If you could take a hash and some data that was close to the data for which the hash was computed, and find the single-bit flip that would allow the data to match the hash, then you'd have a very easy way of creating SHA256 collisions. And if you had such an algorithm, you wouldn't use it in a filesystem, you'd use it to break all of the systems that rely on SHA256 collisions being difficult to create. If you want error correcting codes in a filesystem, then you'd use an error correcting code, not a cryptographic hash.

--
I am TheRaven on Soylent News
Re: Data integrity by Anonymous Coward · 2013-09-19 20:39 · Score: 0

Please read a book on coding theory before saying something highly retarded. There are many other Error Detecting and Correcting codes other than a distance 4 hamming codes hardwired into RAM chips. Also HASHING is NOT a DATA RECOVERY nor a DATA COMPRESSION algorithm. Even if you hash just a single data sector back when they were 512 bytes, it will still take 256^512 different combinations before finding the correct byte in the worst case. 256^(512) is more time than is needed to brute force EVERY ENCRYPTED MESSAGE THAT HAS EVER BEEN SENT OR WILL BE SENT WITH AES.
Re: Data integrity by MightyYar · 2013-09-20 00:26 · Score: 1

btrfs looks very promising. The fact that SUSE is considering making it the default is heartening. When I was setting up my server, it was not a real option. Even now, I might be a little uneasy until someone is using the multi-disk stuff in production. I'll keep playing with it :)

--
W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.
ZFS On-Disk Format, RAID-Z on-disk format by grahamperrin · 2013-09-20 05:22 · Score: 1

http://open-zfs.org/wiki/Developer_resources#Implementation_documentation notes that the ZFS On-Disk Format document is "a good overview, but sorely outdated". Of possible interest: Max Bruning's weblog: ZFS Raidz Data Walk (2009)
alternatives to the zfsrogue code by grahamperrin · 2013-09-20 05:51 · Score: 1

https://github.com/zfsrogue/zfs-crypto

In the ZFSonLinux area at https://github.com/zfsonlinux/zfs/issues/494#issuecomment-23652335 it's noted that the zfsrogue code is encumbered and so, will not be used.

There's an earlier comment https://github.com/zfsonlinux/zfs/issues/494#issuecomment-7158618 and a corresponding note in the OpenZFS wiki: The early ZFS encryption code published in the zfs-crypto repository of OpenSolaris.org could be a starting point
OpenZFS umbrella, a single point of information by grahamperrin · 2013-09-20 14:56 · Score: 1

hope that this nice umbrella will evolve into a single point of access
If you find time, I recommend listening to bsdtalk227 (listed under Publications ).
https://twitter.com/grahamperrin/status/380395699734466560 quoting plus hashtag from a Delphix blog: " To some degree, #OpenZFS is just putting a name to what we have already been doing as a community ".

Sometimes these things turn into just another layer of non-information.
I understand your concern.
The first of the three goals of OpenZFS is to raise awareness of the quality, utility, and availability of open source implementations of ZFS. As an end user, very much into awareness-raising of ZFS, I'll occasionally edit (and/or discuss in IRC) wherever I feel that the value of something thats in the wiki is not immediately clear. But I'm neither a developer nor a typical end user, so there'll be large areas that are beyond me. Maximising the value of contributions to the wiki should be very much a collaborative effort...