OpenZFS Project Launches, Uniting ZFS Developers

I'm addicted by MightyYar · 2013-09-17 12:09 · Score: 4, Interesting

I love ZFS, if one can love a file system. Even for home use. It requires a little bit nicer hardware than a typical NAS, but the data integrity is worth it. I'm old enough to have been burned by random disk corruption, flaky disk controllers, and bad cables.

--
W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.

Re:I'm addicted by Anonymous Coward · 2013-09-17 12:18 · Score: 5, Funny

I love ZFS too, but I'd fucking kill for and open ReiserFS...
Re:I'm addicted by Virtucon · 2013-09-17 12:42 · Score: 4, Funny

I think that anything having to do with ReiserFS is a dead end.

--
Harrison's Postulate - "For every action there is an equal and opposite criticism"
Re:I'm addicted by TheGoodNamesWereGone · 2013-09-17 12:44 · Score: 2

Well, this *is* SLASHdot (rimshot)
Re:I'm addicted by philip.paradis · 2013-09-17 19:19 · Score: 2

I guess nobody got the joke.

--
Write failed: Broken pipe
Re:I'm addicted by The+Last+Gunslinger · 2013-09-17 21:19 · Score: 4, Insightful

I'm sure most readers here "got" it. It just wasn't funny.
Re:I'm addicted by drinkypoo · 2013-09-17 23:31 · Score: 3, Funny

OK stop already, you guys are driving this joke into the woods.

--
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
Re:I'm addicted by TheLink · 2013-09-18 03:29 · Score: 2

Nah the real problem is vendor lock-in...
--
- Too many replies beneath your current threshold
Re:I'm addicted by Rato+Ruter · 2013-09-19 06:06 · Score: 2

I spent some time testing various workloads on ScheisseFS, but in the end it was just a shitty solution.
What a crappy wordplay!

all i want is BP-rewrite by Anonymous Coward · 2013-09-17 12:13 · Score: 5, Informative

If this gets us BP-rewrite, the holy grail of ZFS i'll be a happy man.

For those who don't know what it is - BP-rewrite is block pointer rewrite, a feature promised for many years now but has never come. It's a lot like cold fusion is that its always X years away from us.

BP-rewrite would allow implementation of the following features
- Defrag
- Shrinking vdevs
- Removing vdevs from pools
- Evacuating data from a vdev (say you wanted to destroy you're old 10 disk vdev and add it back to the pool as a different numbered disk vdev)

Re:all i want is BP-rewrite by saleenS281 · 2013-09-17 15:36 · Score: 5, Informative

Because a COW filesystem will become fragmented over time simply by the way it works. As you delete files, you're only free-ing up small segments of contiguous blocks. Over time, this leads to fragmentation because writes are sometimes forced into non-optimal disk placement due to lack of free space. Granted - if you never fill the pool beyond 50%, it won't be a problem. For everyone else, it's a matter of when, not if it will become fragmented.
Re:all i want is BP-rewrite by saleenS281 · 2013-09-17 15:38 · Score: 2

This will have little to no effect on the bp-rewrite situation. The only people with the skill and intimate knowledge of ZFS to do the bp-rewrite coding have stated both that it's extremely difficult, and that the companies they work for/with have no interest in implementing the feature/paying them to work on the problem. I haven't heard any of them volunteering their free time to focus on it either. This is more or less a marketing campaign IMO.
Re:all i want is BP-rewrite by smash · 2013-09-17 20:36 · Score: 2

So you propose that we kill array performance for a bit to de-fragment? Do you have any idea how long it takes to defragment multiple terabytes of data? On a multi-user multitasking OS access is more random anyhow, so its not like your contiguous files are likely to be read sequentially anyhow.
No, for a mission critical system that actually has a workload, its probably much easier/better to just maintain free space.

--
I run: Windows, OS X, Linux, FreeBSD. Just because you have a hammer, doesn't mean everything is a nail.
Re:all i want is BP-rewrite by TheRaven64 · 2013-09-17 21:13 · Score: 2

Ideally, in something like ZFS you'd want background defragmentation. When you a file that hadn't been modified for a while into ARC, you'd make a note. When it's about to be flushed unmodified, if there is some spare write capacity you'd write the entire file out contiguously and then update the block pointers to use the new version.
That said, defragmentation is intrinsically incompatible with deduplication, as it is not possible to have multiple files that all refer to some of same blocks all being contiguous on disk. It's also not a problem if you've got a decent sized L2ARC, as the random reads on the disk are fairly rare.

--
I am TheRaven on Soylent News
Re:all i want is BP-rewrite by Above · 2013-09-18 01:02 · Score: 3, Insightful

You are correct that the disk will become fragmented, but the implication is fragmentation is a problem and that's simply not true. One of the prime causes of the misunderstanding is that fragmentation in Unix file systems is night and day different than fragmentation in a FAT file system, where most people are used to defragging windows drives. Unix file systems use much better algorithms to control fragmentation, so there is (generally) a lot less on a per file basis. They also automatically defragment, there are cases where when a fragmented file is written to the file system will defragment part of that file and rewrite it.
The Berkeley FFS was the first to "solve" this problem, reserving 10% of the disk space primarily to avoid fragmentation. Decades of experience show that for all but the most corner of corner cases, that is enough, causing no significant amount of fragmentation, or performance degradation.
* http://www.eecs.harvard.edu/~keith/research/tr94.html
* http://www.cs.berkeley.edu/~brewer/cs262/FFS.pdf
* http://www.cs.rutgers.edu/~pxk/416/notes/12-fs-studies.html
* http://pages.cs.wisc.edu/~remzi/OSTEP/file-ffs.pdf
The result is that for most applications fragmentation is a complete non-issue. After 25 years of playing with various file systems I've only seen it be an issue once, on an NNTP server that reached 20% fragmentation. Most user desktops and general purpose servers have under 1% fragmentation at all times. Generally, if you have a fragmentation problem it's because the storage is too full, and you need to add storage anyway (the aforementioned NNTP server was a good example). Adding the storage makes the problem go away.
Most users of Unix file systems will never need to give fragmentation a second thought.

Still CDDL... by volkerdi · 2013-09-17 12:15 · Score: 4, Informative

Oh well. I'd somehow hoped "truly open source" meant BSD license, or LGPL.

Re:Still CDDL... by larry+bagina · 2013-09-17 13:26 · Score: 3, Informative

CDDL is basically LGPL on a per-file basis.

--
Do you even lift?
These aren't the 'roids you're looking for.
Re:Still CDDL... by volkerdi · 2013-09-17 15:25 · Score: 2

CDDL is basically LGPL on a per-file basis.
Perhaps the intent of the licenses is similar, but there's more to a license than that. Unfortunately, being licensed under the CDDL causes a lot more license incompatibility restrictions than either the LGPL or BSD license do. If it were under one of those, there'd be hope for seeing it as an included filesystem in the Linux kernel. But since it's under the CDDL, that can't happen.
The developers are, of course, welcome to use whatever license they like. Just pointing out that the CDDL is *not* basically the LGPL under "per-file" or any other basis.
Re:Still CDDL... by Anonymous Coward · 2013-09-17 17:29 · Score: 2, Insightful

The GPL is the problem here, not the CDDL.
It's funny how you cite license incompatibility restrictions, but Linux is the only one having those problems.
OS X, FreeBSD and others don't seem to be having any problems with the CDDL.
Gee, I wonder why.
Re:Still CDDL... by devman · 2013-09-18 02:32 · Score: 2

In fairness its GPL that has the incompatibility problem not CDDL.
CDDL is compatible BSD, Apache2, LGPL, etc.
GPLv2 is incompatbile with CDDL, Apache2, GPLv3, LGPLv3, etc.
Even if the license were not CDDL, it would have to be released under a license that came with a patent clause, which means GPLv3, LGPLv3, Apache2 or similar all of which are incompatible with GPLv2 which Linux is licensed under.
CDDL isn't the problem.

Patents? by Danathar · 2013-09-17 12:19 · Score: 3, Insightful

Not to rain on anybody's parade,but will the commercial holders of ZFS allow this? Or will they unleash some unholy patent suit to keep it from happening?

Re:Patents? by gagol · 2013-09-17 12:26 · Score: 2

Same licence, new name. Its more about uniting dev efforts under one roof.

--
Tomorrow is another day...
Re:Patents? by utkonos · 2013-09-17 12:49 · Score: 4, Informative

FAQ much? There is no central source repository for OpenZFS. Each supported operating system has it's own repository. The previous also has a link to the source tree for each of the supported projects under the umbrella.

If you're successful, Larry will come a callin' by YesIAmAScript · 2013-09-17 12:26 · Score: 2, Funny

As long as Oracle's patents are valid, can anyone seriously believe this will go anywhere?

His fleet of boats isn't going to pay for itself.

--
http://lkml.org/lkml/2005/8/20/95

Re:If you're successful, Larry will come a callin' by Virtucon · 2013-09-17 12:46 · Score: 2

You mean that fleet of losing boats? Last time I checked it was 7-1 NZ with first to 9 winning.

--
Harrison's Postulate - "For every action there is an equal and opposite criticism"
Re:If you're successful, Larry will come a callin' by stoploss · 2013-09-17 13:08 · Score: 4, Funny

Collecting money from opensource-companys? Daryl McBride will turn in his grave if Larry is even stupid enough to try it...
Eh? I don't think that the Mormons bury their living, no matter how ghoulish are the corporations that they helm.
I'm afraid Daryl McBride will be quite operational when your friends' commits arrive...
Re:If you're successful, Larry will come a callin' by Bengie · 2013-09-17 13:15 · Score: 5, Informative

Oracle released ZFS under a BSD compatible license. Anyone is allowed to do whatever to the opensource code. Going forward, Oracle has not opened an code after v28, which is the last OpenSource version to be compatible with Oracle ZFS.
Re:If you're successful, Larry will come a callin' by TheRaven64 · 2013-09-17 21:16 · Score: 2

It's released under the CDDL, which explicitly grants patent rights. If they had licensed it under GPLv2, then they would have been able to sue people (clause 7 allows them to say 'oh, we've just noticed that we have patents on this. Everyone stop distributing it!') and if they'd released it under Apache2 or GPLv3 then it would still be GPLv2-incompatible, so still wouldn't have been useable in Linux.

--
I am TheRaven on Soylent News

Advatages of ZFS over BTRFS? by TheGoodNamesWereGone · 2013-09-17 12:43 · Score: 2, Insightful

I'm sure I'll be corrected if I'm wrong, but does it offer any advantage over BTRFS? I'm not trying to start a flame war; I'm honestly asking.

Re:Advatages of ZFS over BTRFS? by Vesvvi · 2013-09-17 12:59 · Score: 5, Informative

I don't have any practical experience with BTRFS, but I use ZFS heavily at work.
The advantage of ZFS is that it's tested, and it just works. When I started with our first ZFS testbed, I abused that thing in scary ways trying to get it to fail: hotplugging RAID controller cards, etc. Nothing really scratched it. Over the years I've made additional bad decisions such as upgrading filesystem versions while in a degraded state, missing logs, etc, but nothing has ever caused me to lose data, ever.
The one negative to ZFS (if you can call it that) is that it makes you aware of inevitable failures (scrubs catch them). I'll lose about 1 or 2 files per year (out of many many terrabytes) just due to lousy luck, unless I store redundant high-level copies of data and/or metadata. Right now I use use stripes over many sets of mirrored drives, but it's not enough when you read or write huge quantities of data. I've ran the numbers and our losses are reasonable, but it's sobering to see the harsh reality that "good enough" efforts just aren't good enough for 100% at scale.
Re:Advatages of ZFS over BTRFS? by mysidia · 2013-09-17 13:20 · Score: 2

I'm sure I'll be corrected if I'm wrong, but does it offer any advantage over BTRFS? I'm not trying to start a flame war; I'm honestly asking.
BTRFS is still highly experimental. I had production ZFS systems back in 2008. A mature ZFS implementation is a lot less likely to lose your data with filesystem code at fault (assuming you choose appropriate hardware and appropriate RAIDZ levels with redundancy).
Re:Advatages of ZFS over BTRFS? by Anonymous Coward · 2013-09-17 13:45 · Score: 3, Informative

You don't understand. ZFS didn't lose that data -- ZFS detected that the underlying disk drives lost that data. You can run ZFS in a highly redundant modes that allow it to reconstruct lost data, but it sounds like OP's redundancy is such that sufficient drives may lose bytes as to cause lost files.
Re:Advatages of ZFS over BTRFS? by Bengie · 2013-09-17 13:45 · Score: 2

"Unexpectedly" lost data. The things he's mentioned would have hosed other Fes' completely, but losing some data because his lack of redundancy is fine.
Re:Advatages of ZFS over BTRFS? by sl3xd · 2013-09-17 14:10 · Score: 2

BTRFS has a large number of features that are still in the "being implemented", or "planning" stages. In contrast, those features are already present, well tested, and in production for half a decade on ZFS. Many touted "future" features (such as encryption) of BTRFS are documented as "maybe in the future, if the planets are right, we'll implement this. But not anytime soon"
Comparing the two is like making up an imaginary timeline where ReiserFS 3 was 4-5 years old and in wide deployment while ext2 was being developed, with plans to implement journaling (ie. ext3) and extents (ie. ext4) still in the "TODO" stage.
My own BTRFS system is appallingly slow compared to running ext4 on the same hardware; in contrast zfsonlinux is amazing.

--
-- Sometimes you have to turn the lights off in order to see.
Re:Advatages of ZFS over BTRFS? by BitZtream · 2013-09-17 14:53 · Score: 2

I corrupted some files by the following:
This is a home setup, all parts are generic cheapo desktop grade components, except slightly upgraded rocket raid cards in dumb mode for additional sata ports:
4 HDDs, 2 vdevs that 2 drive mirrors (RAID 1+0 with 4 drives essentially)
1 drive in a 2 drive mirror fails, no hot spare.
When inserting a replacement drive for the failed drive, the SATA cable to the remaining drive in the mirror was jiggled and the controller considered it disconnected.
The pool instantly went offline. When the drive reconnected, and the new drive was added to the mirror, during the resilvering process, 2 files were detected with invalid checksums. There were files that were being written at that moment the VDEV was yanked out from under ZFS.
Scrub found additional correctable errors and repaired them, but the files it marked as irreparable were clearly irreparable.
Simply deleting the corrupted files cleared the pool errors after the next scrub. Since I was copying those files anyway when the failure occurred, I just recopied them and nothing was actually lost .
Of course, I really can't expect anything else to have happened. I'm EXTREMELY grateful that it didn't take the entire pool down, so while there was 'data loss' it performed exactly as I would have hoped it to.
You can't expect much better than what it did considering an entire vddv (both drives in the mirror) went off line as data was being written to them.
Redundant metadata can't solve the problem of large amounts of the HDD becoming unreadable, which given enough terabytes is going to happen, and possibly often when you get into big data sets (think LHC size data sets). You can of course, zfs set copies=5 on the pool, or whatever number of copies you want to get additional protection, but then you might as well just put more drives in the same vdev and benefit from increased read speeds. Copies=1 by default, making it entirely possible to lose data.

--
Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
Re:Advatages of ZFS over BTRFS? by deek · 2013-09-17 14:55 · Score: 2

I'm playing around with btrfs at the moment, and I've spotted some inconsistencies in the document you mentioned.
* Subvolumes can be moved and renamed under btrfs. I do this on a daily basis.
* btrfs can do read-only snapshots. Mind you, it does have to be specified.
* As far as I can tell, "df" does work fine with btrfs. The document implies it does not.
I am still quite new to btrfs, so I'm learning much at the moment. There may be more points that I've missed.
It seems, though, your document is a bit out of date, and btrfs has improved since then.
Re:Advatages of ZFS over BTRFS? by hedwards · 2013-09-17 15:30 · Score: 2

That's never been true, you always had the option of detaching it or outright deleting just one disk, you just had to make sure you did it in a careful manner so as not to delete things you didn't want to delete.
Also resizing a volume on a disk is a risky operation to engage in. If it's something that you really need to do, the correct way is to back up the data to a separate disk and restore it to a new volume. Resizing volumes is not exactly in keeping with the philosophy that led to ZFS being created.
Re:Advatages of ZFS over BTRFS? by mysidia · 2013-09-17 16:00 · Score: 2

You can't expect much better than what it did considering an entire vddv (both drives in the mirror) went off line as data was being written to them.
I do expect better, because ZFS is supposed to handle this situation, where a volume goes down with in-flight operations; the filesystem by design is supposed to be able to re-Import the pool after system restart and recover cleanly....
That shouldn't of happened; it sounds like either the hard drive acknowledged a cache FLUSH, before data had been written to disk, the ZIL was broken (or disabled), or indeed a ZFS bug was found.
But in the absence of evidence that the disk hardware properly obeys the cache flush command semantics; great suspicion should be pointed at it.
The whole point of the zfs ZIL is to log in-flight writes, before the writes get added to the pool data, so if there is a halt; the in-flight writes are either completed or aborted in an crash-consistent way --- ala filesystem journaling.
The pool showing 'corrupt' data indicates ZPOOL was remounted but the state wasn't crash consistent....
[Or perhaps the hard drive data signal line did not have a clean break, and part of a write command's content was damaged in flight]
Re:Advatages of ZFS over BTRFS? by batkiwi · 2013-09-17 16:36 · Score: 5, Insightful

Nice FUD there. You picked the btrfs-progs, which are the userspace tools, not the actual btrfs filesystem driver.
http://git.kernel.org/cgit/linux/kernel/git/josef/btrfs-next.git/log/
Re:Advatages of ZFS over BTRFS? by Vesvvi · 2013-09-17 17:30 · Score: 4, Interesting

This is correct.
It is statistically assured that you will lose some data with anything less than obscene redundancy. I've run the numbers and we've settled on what's acceptable to us: we have offline backups far more frequently than 2 times/year for everything, so dropping about 2 files/year that are completely unrecoverable without backups isn't a big deal.
These systems are running a moderate number of very large static files, mixed with a very large number of very small files. The small files are SQLite-style records, and we churn through them very rapidly. I don't know exactly why, but it is always these small files that we lose: there is clearly a bias towards things that are written frequently. The analyst in me is quick to point out that implies failures in ZFS itself, beyond just the disks and "bit rot", but the accelerated failure isn't enough to worry about. So our non-failure rate is easily 6-nines or better per year on the live storage system, but it's still a bit uncomfortable to know that some data is going to be gone, despite that.
With a minimal amount of effort you can get hardware and software which is not longer the biggest threat to your data. I am personally the most likely source of a catastrophic failure: operator error is more likely than an obscure hardware failure. ZFS has allowed me to reduce that operator error (snapshots, piping filesystems, nested datasets with inheritance), and simultaneously it's outperforming other options on both speeds and security. Overall, I'm extremely pleased.
Re:Advatages of ZFS over BTRFS? by Vesvvi · 2013-09-17 17:55 · Score: 2

I had an upgrade path similar to yours, starting with RAIDZ and moving the a group of mirrors. I try not to let any pool get too big, so there are maybe 20 drives/pool. It's always the small files that are lost (see post above) I think each server does about 6 PB/year each direction on these highly-accessed files, so I think it's reasonable to drop ~1MB of non-critical files (they basically store notes of data analysis).
So far I've never had a problem with VM images, but now we're mitigating that by adding redundant but isolated storage servers. I'm sure you could manage this without ZFS snapshots and send/recv, but I wouldn't want to try.
Re:Advatages of ZFS over BTRFS? by deek · 2013-09-17 18:12 · Score: 2

Gotcha. So btrfs and df play up only under a raid1 situation. That explains why I didn't notice any problem.
As for snapshots, I've set up an automated snapshot system using btrfs. Main volume is mounted to /snapshots. One subvolume is created in there, and is then separately mounted to /data . Snapshots are created under the /snapshot directory, while /data is the path used by applications. I've created a nightly script which renames all previous snapshots, and then creates a new snapshot. It all works seamlessly, and it seems pretty easy to understand. I'm unsure what the fuss is, really.
Re: Advatages of ZFS over BTRFS? by WuphonsReach · 2013-09-17 23:30 · Score: 2

If you are talking about zpools, there are commands to add or remove devices as needed, and the pool can even use a bigger (why would you put an smaller?) device as soon as it is detected, starting the resync automatically.

Limited number of drive slots + moving to a smaller, but faster platter in one or more of those slots.

--
Wolde you bothe eate your cake, and have your cake?

Re:FINALLY. by Anonymous Coward · 2013-09-17 12:53 · Score: 3, Informative

Been using btrfs for several non-essential file systems. Working great so far, and have even done several successful bedup runs. Has worked great for minimizing disk usage on some Maven repositories with lots of duplicate files between Jenkins and Nexus. Maybe not tested enough for your server that you need to stay up all the time, but great for the home desktop (provided you're sane and are keeping backups, which you should be doing already anyway). The more testing it gets, the sooner it becomes "tested enough" for the needs-to-always-be-available server.

Re:ZFS for Windows? by Virtucon · 2013-09-17 13:05 · Score: 2

It doesn't have to be POSIX compliant to have it ported to it and it doesn't require somebody to pay for licensing. With the Features of ZFS one could argue that a port to at least Windows Server would be great and it would garnish quite a following from those who've had to put up with the way NTFS views disk volumes and storage. There are applications that run well on Windows, especially on the Server side of things so I wouldn't call it dead quite yet. Besides, with Server 2012 we now have Storage Spaces and ReFS which brings some ZFS features to the table, but it's nowhere as sophisticated ad ZFS. There's already been one attempt but it doesn't appear to be actively maintained and it's read only. Oracle has software for Windows Server that interfaces to the Sun ZFS Storage Server (SAN) that works at the VSS level. It's not exposing a ZFS filesystem to windows either, but ZFS is configurable in the SAN. That's a hefty uplift if you're already in deep with EMC or NetApp.

--
Harrison's Postulate - "For every action there is an equal and opposite criticism"

Re:FINALLY. by Virtucon · 2013-09-17 13:13 · Score: 3, Informative

licensing or patent issues?
What you also forget is that Oracle was the leading proponent of BTRFS and yes it had to do with licensing and patents from Sun. Once they acquired Sun that all went out the window. If I were the CEO at Oracle I'd ask "Why two file systems that essentially do the same thing? One's mature and the other, not so much" That's why BTRFS still survives but now with less Oracle support. Wait, is that a bad thing?

--
Harrison's Postulate - "For every action there is an equal and opposite criticism"

Re:Cool, but.. by Bengie · 2013-09-17 13:17 · Score: 4, Insightful

Everything else is already handled with LVM and software RAID.

You have a great sense of humor, keep it up.

Re:Cool, but.. by smash · 2013-09-17 13:38 · Score: 3, Informative

That. Those who don't understand ZFS are condemned to reinvent it, poorly.

--
I run: Windows, OS X, Linux, FreeBSD. Just because you have a hammer, doesn't mean everything is a nail.

Re:ZFS for Windows? by tlambert · 2013-09-17 14:15 · Score: 5, Informative

It doesn't have to be POSIX compliant to have it ported to it and it doesn't require somebody to pay for licensing. With the Features of ZFS one could argue that a port to at least Windows Server would be great and it would garnish quite a following from those who've had to put up with the way NTFS views disk volumes and storage.

Windows isn't a very friendly development platform for Open Source, starting with the licensing requirements for tools and distribution restrictions on binaries derived from those tools when using header files containing substantial code, or runtime libraries. Part of this is an intentional legal defense against WINE and CrossOver Office, and part of it is just scale management by limiting the support community requirements to "serious developers".

In addition, a lot of the installable filesystem and similar code, as well as a lot of the necessary VM internals (memory mapped files and paging/swapping from filesystems) are not adequately explained (i.e. they involve locking text regions with level 0 locks, which require a level 3 lock then a level 0 lock, and to do this to get the offsets on the physical media for the blocks in question. This used to not work on removable media in NT as of 4.0.1; not sure if it's supported yet, but it was the reason you couldn't install it in JAZZ drives or even regular hard drives in removable carriers.

Having developed a filesystem for Windows95 IFSMgr, and reverse engineered all this crap, and having done it again for NT3.51, I would not look forward to having to repeat the process for Windows 7 or Windows 8, which are the only useful versions to target for by the time the code ends up functional.

So unless someone wanted to seriously underwrite the effort (read: it's have to be done by Oracle, or by a startup who had a monetization strategy that Microsoft wouldn't preempt, like they did when my team, at a previous employer, ported UFS + Soft Updates to Windows 95, and they announced Longhorn-which-never-happened, and then put together a lawsuit about "deep reverse engineering" which would have precluded using it as a bootable FS... no thanks.

aka bcache + any filesystem you want by raymorris · 2013-09-17 14:26 · Score: 3, Informative

Using a small, fast SSD as a cache for large, slow disks can be awesome for some workloads, mostly servers with many concurrent users.

To do that with ANY filesystem, bcache is now part of the mainline kernel . dmcache does the same thing, and there is another one that Facebook uses.

Re:ZFS for Windows? by BitZtream · 2013-09-17 14:34 · Score: 2

Windows isn't a very friendly development platform for Open Source, starting with the licensing requirements for tools and distribution restrictions on binaries derived from those tools when using header files containing substantial code, or runtime libraries.

Well, the tools are free and there isn't a redistribution problem, never has been.

Now, you could argue that ZFS and Windows won't work unless MS does it because ZFS is the whole disk I/O stack rolled into one, and no driver is going to work with the kernel to allow the ZFS system to work in windows, but thats another story entirely. Theres no way to bypass the disk cache for instance, not in a way ZFS would be compatible with. ZFS must use its own cache, and directly access the raw devices, and provide the filesystem driver all rolled into one ... but spread all across the kernel, in order to get proper performance.

Could get pretty close with some good hacks though, such as FUSE.

--
Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager

Still no encryption... *sigh* by the_B0fh · 2013-09-17 14:37 · Score: 2

I wish they had encryption... *sigh*

No, I don't want workarounds, I want it to be built in to ZFS like in Solaris 11.

Re: Data integrity by MightyYar · 2013-09-17 14:51 · Score: 4, Informative

Not sure what you mean. You certainly can set up a mirrored pair (or triplet or quadruplet), but you can also set up what's referred to as raidz, where it stripes the redundancy across multiple disks. You can configure how much redundancy... 1, 2, or more disks if you like. You can also tell ZFS to keep multiple copies of blocks, and it will spread those copies out among the disks. You can set that policy per sub-volume (file system in zfs-speak), so that if you decide that some of your data deserves more redundancy, you can set up a folder that will keep 2 copies of everything, but leave all the other folders at 1 copy. It's super geeky. I've had it detect (and correct) corruption in a failing disk, detect corruption because of a flaky disk controller that would otherwise pretend to work fine, and detect corruption when a SATA cable came loose. Combined with the ECC RAM in the server, I feel more comfortable about the integrity of my data than I ever have. I've lost family photos before to random drive corruption, so I'm sensitive to this stuff :)

--
W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.

Re: Data integrity by saleenS281 · 2013-09-17 15:34 · Score: 4, Informative

One point to be extremely clear on however - when you set copies = 2 on a folder level, it does NOT guarantee those copies end up on different physical spindles. Early on there were many people who lost files because they skipped RAID thinking that copies=X would protect their data. It is NOT meant as a means to protect against hardware failures.

Re: Data integrity by greg1104 · 2013-09-17 17:40 · Score: 4, Interesting

ECC RAM is an important part here, due to how scrubbing works in ZFS. The background disk scrubbing can check every block on the filesystem to see if it still matches its checksum, and it tries to repair issues found too. But if your memory is prone to flipping a bit, that can result in scrubbing actually destroying data that was perfectly fine until then. The worst case impact could even destroy the whole pool like that. It's a controversial issue; the odds of a massive pool failure and associated doom and gloom are seen as overblown by many people too. There's a quick summary of a community opinion survey at ZFS and ECC RAM, but sadly the mailing list links are broken and only lead to Oracle's crap now.

Re: Data integrity by kthreadd · 2013-09-17 18:09 · Score: 3, Informative

That's what you have backups for.

Re:What's the difference? by Bert64 · 2013-09-17 18:52 · Score: 2

Temporary files and swap aren't a problem...

Swap can and should be stored on a separate partition, and encrypted using a randomly generated key so its completely lost after a reboot.

On a properly configured system, only a very small number of locations will be writable by the user, typically the user's home directory and a temporary area... The temporary area can be stored in ram/swap since it doesn't matter if its contents are lost and home can be encrypted.

It's trivial to add a hardware key logger to virtually any system irrespective of how the software is configured, if someone untrusted has had unescorted physical access to the system then the system should be considered compromised anyway. A hardware keylogger is also os independent, doing it on software requires the malicious party to know what os you're using in advance in order to have a compatible keylogger, and also to work around any non standard configuration you might have.

--
http://spamdecoy.net - free throwaway anonymous email - avoid spam!

Re: Data integrity by TheRaven64 · 2013-09-17 21:06 · Score: 4, Insightful

ZFS doesn't have ECC, but it does checksum each block, so it can detect per-block errors. If you have valuable data, you can set the copies property to some value greater than 1 for that data set and it will ensure that each block is duplicated on the disk so if one fails a checksum then the other will be used to recover. If you have three disks, you can use RAID-Z, which loses you 1/3 of the space (not 1/2) and allows any single-disk failures to be recovered. Running zfs scrub will make it validate all of the data and when any read fails the checksums recover the data from the other two.

The reason it doesn't use ECC is that ECC doesn't mesh well with the failure modes of disks. ECC is used in RAM because when it gets hot, hit by a solar ray, or whatever, it is common for a single bit to flip (in a single direction, which makes the error correction easier). In a disk, you typically have an entire block fail, not a single bit. Modern disks use multiple levels, so the smallest failure that is even theoretically possible might be a single byte (or nibble) in a block. And since the failure isn't biased, you'd need a fairly large amount of space. A better approach would be for the filesystem to generate something like Reed–Solomon code blocks for every n blocks that are written. This would allow single-block errors to be recovered, as long as the other blocks are okay. The down side of this approach is that the error correcting block would need to be rewritten whenever any of the other blocks is modified. this might be relatively easy to add to ZFS, as it uses a CoW structure, so block-overwrites are relatively rare (although erasing a lot of data would require a lot of checksums to be recalculated). This would mean that a single-block write would end up triggering a lot of reads and that would hurt performance. For ZFS, this might actually be easier to implement, as blocks are written out in transaction groups and so including an error correction block at the end might be a fairly simple modification.

--
I am TheRaven on Soylent News

Re:ZFS for Windows? by pr0nbot · 2013-09-17 23:25 · Score: 2

(You seem to write well so you'll probably appreciate being reminded it's "garner" not "garnish")

Re:ZFS for Windows? by tlambert · 2013-09-17 23:46 · Score: 2

Windows isn't a very friendly development platform for Open Source, starting with the licensing requirements for tools and distribution restrictions on binaries derived from those tools when using header files containing substantial code, or runtime libraries.

Well, the tools are free and there isn't a redistribution problem, never has been.

Not according to this document; the runtime components are not redistributable. This is an Anti-WINE license measure:

http://msdn.microsoft.com/en-us/library/ms235299(v=vs.90).aspx

Now, you could argue that ZFS and Windows won't work unless MS does it because ZFS is the whole disk I/O stack rolled into one, and no driver is going to work with the kernel to allow the ZFS system to work in windows, but thats another story entirely. Theres no way to bypass the disk cache for instance, not in a way ZFS would be compatible with. ZFS must use its own cache, and directly access the raw devices, and provide the filesystem driver all rolled into one ... but spread all across the kernel, in order to get proper performance.

Could get pretty close with some good hacks though, such as FUSE.

This is actually reverse-engineerable. FUSE isn't an option, since pages which get memory mapped and dirtied are not propagated up via invalidation events. This is the same problem the Heidemann stacking framework has if you stack FS A on top of FS B, and then expose both of them as visible in the mount hierarchy namespace. You can do some things, but you can't do really complicated things.

Re: FINALLY. by Eunuchswear · 2013-09-18 01:14 · Score: 3, Funny

You don't have a multi-petabyte array with mission criitical data at home?

--
Watch this Heartland Institute video

Slashdot Mirror

OpenZFS Project Launches, Uniting ZFS Developers

61 of 297 comments (clear)