ZFS Replication To the Cloud Is Finally Here and It's Fast (arstechnica.com)

← Back to Stories (view on slashdot.org)

ZFS Replication To the Cloud Is Finally Here and It's Fast (arstechnica.com)

Posted by ryuzaki0 on Monday December 21, 2015 @09:34PM from the greased-lightning dept.

New submitter kozubik writes: Jim Salter at Ars Technica provides a detailed, technical rundown of ZFS send and receive, and compares it to traditional remote syncing and backup tools such as rsync. He writes: "In mid-August, the first commercially available ZFS cloud replication target became available at rsync.net. Who cares, right? As the service itself states, If you're not sure what this means, our product is Not For You. ... after 15 years of daily use, I knew exactly what rsync's weaknesses were, and I targeted them ruthlessly."

23 of 150 comments (clear)

Min score:

Reason:

Sort:

rsync and zfs do different things by MichaelSmith · 2015-12-21 21:55 · Score: 5, Informative

rsync synchronises files. ZFS synchronises a file system. Of course it is better to work that way because you can transfer just the changed components of a file. Moving a file just changes a pointer, so send the pointer. That sort of thing.

--
http://michaelsmith.id.au
VM Replication by tomknight · 2015-12-21 21:58 · Score: 3, Interesting

I was a little unexcited by (although interested in) the article, even by the general speedups until I got to the part about VM replication. This really makes an enormous difference.
ZFS licensing has kept this as a grey area for me, so I I've largely kept away from deployment (save for an emergency FreeNAS box I needed in a hurry), but I'd clearly benefit from looking here again. Thanks for the reminder.
Oh, I also appreciate the rsync.net advertisement. Good guys, good service ;-)

--
Oh arse
1. Re:VM Replication by Lennie · 2015-12-21 22:44 · Score: 2
  
  The article did feel like an advertisement.
  They offer a VM with lots of a disk space, is that really that special ?
  I know of at least one that offers something similar:
  https://www.vultr.com/pricing/...
  I guess not at the same scale and with a bandwidth limit.
  What I think is kind of funny is how people are surprised that ZFS works well for VM-images.
  rsync is meant/optimized for transfering files, not blocks.
  ZFS is meant for transfering filesystem blocks, VM-images are blocks too.
  So ZFS works better than rsync for that. That isn't so surprising.
  Anyway the whole VM thing has been a big distraction, containers/zones were already in wide spread use before we VMs were in wide spread use.
  I'm glad containers are getting more attention now. Partly because of things like storage. Who wants to deal with VM-images if you can have files ?
  
  --
  New things are always on the horizon
2. Re:VM Replication by Bengie · 2015-12-22 01:12 · Score: 2
  
  Depends on what you're calling "containers". BSD Jails have been around for a long time, but what Linux calls "containers" are crappy attempts to containerize. The Linux community has this unhealthy "not invented here" syndrome that results in a lot of square wheels.
3. Re:VM Replication by Rutulian · 2015-12-22 04:02 · Score: 2
  
  but what Linux calls "containers" are crappy attempts to containerize.
  
  Not sure what you mean. Jails have been around for a long time, but LXC/LXD containers have almost identical functionality.
  container templates...check
  filesystem snapshot integration (ZFS, btrfs) with cloning operations...check
  resource limits...check
  unprivileged containers...check
  network isolation...more flexible under LXC than Jails, in my opinion
  bind mounts in containers...check
  nice management utilities...check
  live migration...in development
  Only difference I can see really is that LXC doesn't support nested containers...
Charming by wonkey_monkey · 2015-12-21 21:58 · Score: 2, Insightful

Who cares, right? As the service itself states, If you're not sure what this means, our product is Not For You.
Ah, there's that welcoming open-source community spirit.

--
systemd is Roko's Basilisk.
1. Re:Charming by greenfruitsalad · 2015-12-21 23:05 · Score: 3, Informative
  
  there are things in this world that simply aren't meant for participation award winners. so go get offended somewhere else.
  if somebody doesn't know what ZFS replication is, their product clearly isn't meant for them. why bother with explanation to a visitor that has no use for the product/service?
  the attitude of these ZFS people is still quite welcoming compared to some connectivity providers i've dealt with. e.g. bogons.net will just politely tell you to f*ck off if you don't fully understand what you're purchasing from them (dwdm/cwdm rings).
2. Re: Charming by greenfruitsalad · 2015-12-21 23:50 · Score: 2, Informative
  
  their howtos hold newbies' hands sufficiently. they simply don't provide a free "Oracle ZFS Storage Appliance Administration course", which is what some people seem to expect. it seems i am discussing this with people who haven't even visited their website, so i'll stop here.
Rsync could have done this too! by urdak · 2015-12-21 22:01 · Score: 4, Informative

Reading this article, it seems that this "ZFS replication" is very similar to rsync, with one straightforward addition:
Rsync works on an individual file level. It knows how to synchronized each modified file separately, and does this very efficiently. But if a file was renamed, without any further changes, it doesn't notice this fact, and instead notices the new file and sends it in its entirety. "ZFS replication", on the other hand, works on the filesystem level so it knows about renamed files and can send just the "rename" event instead of the entire content of the file.
So if rsync ran through all the files to try to recognize renamed files (e.g., by file sizes and dates, confirming with a hash), it could basically do the same thing. This wouldn't catch the event of renaming *and also* modifying the same file, but this is rarer than simple movements of files and directories. The benefit would have been that this would work on *any* filesystem, not just of ZFS. Since 99.9% of the users out there do not use ZFS, it makes sense to have this feature in rsync, not ZFS.
1. Re:Rsync could have done this too! by brambus · 2015-12-21 22:16 · Score: 5, Insightful
  
  The crucial difference is ZFS send is unidirectional and as such is not affected by link latency. rsync needs to go back-and-forth, comparing notes with the other end all the time. ZFS send is also a lot faster and more efficient, eliminating entire large portions of the filesystem tree structure that haven't changed without having to read them in. This is not to say that rsync's authors were any less competent coders. ZFS simply has more information available about the filesystem than rsync, so it can make smarter decisions.
2. Re:Rsync could have done this too! by geggo98 · 2015-12-21 22:20 · Score: 2
  
  In principle true, but with one exception: If you already use ZFS for other reasons (e.g. checksums in the file system or transparent compression), it's really nice that you can make backups on the filesystem level with rsync like performance. The backup on the filesystem level keeps all file system specific features intact (e.g. the checksums and the compression). So you can have really fast backups and you can be sure, that when you restore the backup, the filesystem will look exactly as it looks now. So you can use rsync when you are you want to backup the content of the files or ZFS snapshots when you want to backup the layout of the filesystem (including the files' content of course).
3. Re:Rsync could have done this too! by Maow · 2015-12-21 22:54 · Score: 2
  
  I was wondering what this offers over a (theoretical?) inotify+rsync app.
  In the comments at the linked-to Ars article, Jim discusses just this approach.
  Basically, and from memory, he determined that it would just be too much work to re-implement something that already works solidly (ZFS) and comes with a huge amount of other features out of the box.
4. Re:Rsync could have done this too! by drinkypoo · 2015-12-21 23:31 · Score: 2
  
  So if rsync ran through all the files to try to recognize renamed files (e.g., by file sizes and dates, confirming with a hash), it could basically do the same thing.
  As a sibling comment points out, rsync does have a mode which handles this. As they don't point out, it is horrendously costly. Making this the default would be a pure idiot move. ZFS has metadata that permits detecting these sort of files, so it is possible to do it cheaply with ZFS.
  What is really wanted IMO is for rsync to detect this stuff and use it when ZFS is present.
  
  --
  "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
5. Re:Rsync could have done this too! by grumbel · 2015-12-21 23:51 · Score: 2
  
  The biggest difference is that ZFS has full knowledge of the state of the file system, rsync on the other side doesn't, it's stateless, it has to start from zero each time and regather the information on each and every run on both sides, which is a really slow and potentially error prone process (i.e. when files change while rsync runs). ZFS knows what's going on in the filesystem and its snapshots the filesystem at a single point in time, so it thus it can be be far quicker and won't produce inconsistencies in the transmitted data. Tracking renames would speed some things up a little, but to match ZFS it would need a way to get the information about what changed in the filesystem from the filesystem itself and at the moment such functionality isn't available.
6. Re:Rsync could have done this too! by urdak · 2015-12-22 00:52 · Score: 2
  
  Not exactly.
  rsync will always have to go through the files and check. Trying to identify stuff like renames will obviously make a difference, but as it's only really going to have any sizeable impact when you happen to have lots of renames, but not actual data changes, it's probably not even worth the effort of implementing it.
  The rename issue is actually *very* important. It's not likely that you'll have a lot of independent renames, but something very likely is that you rename one directory containing a lot of files - and at that point rsync will send the entire content of that directory again. I actually found myself in the past stopping myself from renaming a directory, just because I knew this will incur a huge slowdown next time I do a backup (using rsync).
7. Re:Rsync could have done this too! by brambus · 2015-12-22 02:00 · Score: 2
  
  If you read on a bit in the article, you'll come across the example of daily syncing of VM images across to a backup node. While ZFS send is done in less than an hour, rsync would take north of 7 hours just to read in the local state of the VM image, much less figure out what has changed and send the diffs. This is based entirely on ZFS send's unidirectionality. The critical difference is that rsync needs to trawl the entire local dataset state completely and compare notes with the other box (which also needs to read it all in) in order to figure out what's changed. ZFS send doesn't need to do that.
8. Re:Rsync could have done this too! by DRJlaw · 2015-12-22 06:22 · Score: 2
  
  But this is *not* what the article appears to be measuring. He measured that the time to synchronize a changes were nearly identical in rsync and "ZFS replication" - except when it comes to renames.
  Yet this is what the article says. Does he really have to measure read time to the millisecond instead of providing an estimate? How fast can your disk system read off 2TB of information, anyway?
  "Virtualization keeps getting more and more prevalent, and VMs mean gigantic single files. rsync has a lot of trouble with these. The tool can save you network bandwidth when synchronizing a huge file with only a few changes, but it can't save you disk bandwidth, since rsync needs to read through and tokenize the entire file on both ends before it can even begin moving data across the wire. This was enough to be painful, even on our little 8GB test file. On a two terabyte VM image, it turns into a complete non-starter. I can (and do!) sync a two terabyte VM image daily (across a 5mbps Internet connection) usually in well under an hour. Rsync would need about seven hours just to tokenize those files before it even began actually synchronizing them... and it would render the entire system practically unusable while it did, since it would be greedily reading from the disks at maximum speed in order to do so." (emphasis mine)
ZFS vs BTRFS by Maow · 2015-12-21 22:47 · Score: 2

Jim Salter writes some great pieces on file systems for Ars Technica.
At the linked article are Related Links. Of particular note is "Atomic Cows and Bit Rot" -- read that if you're interested in modern file systems.
1. Re:ZFS vs BTRFS by phayes · 2015-12-22 02:07 · Score: 2
  
  Whereas /. is filled with people such as yourself...
  I've been on /. & ars for close to 2 decades & the level of idiot posts is unfortunately much higher here.
  
  --
  Democracy is a sheep and two wolves deciding what to have for lunch. Freedom is a well armed sheep contesting the issue
Re:BTRFS is the future by rl117 · 2015-12-22 00:03 · Score: 4, Interesting

Er, no. Btrfs may one day make feature parity with ZFS, and it may also achive the reliability of ZFS, but it has a long, long, way to go in both areas to get to those points.
The on-disc structures might have been declared "stable", but what does that mean, really? That you'll be able to mount current filesystems on future kernels, yes. That the frozen design was correct and contains no design flaws? No. Personally, I think they froze it way too early. There are a number of fairly fundamental issues with the Btrfs design which compromise its performance (fsync) and integrity (unbalancing, data loss on recovery), and in some cases place arbitrary limits upon things (e.g. the hardlink issue). Some can be mitigated, while others can not. These and other issues are easily found and researched.
Seriously, I've been using Btrfs since very near the beginning for a variety of tasks. But I've been objective about it, rather than a blinkered fanboi. It's an interesting filesystem with some good ideas. But it has /always/ been a case of "next year it will be stable", and the performance is dire. Progress has been painfully slow, and the bugs I've encountered along the way have been numerous and show-stopping. Maybe it will "get there", but I think your assertion that "once BTFS userland side gets stable" that it will replace ZFS is incredibly naive. It assumes that there are no major issues remaining on the kernel side, and it also assumes that the only thing needing doing on the user side is stability. Based on its history to date, the likelihood of the kernel side being bug-free is close to zero. On the user side the tools are primitive, feature-incomplete and almost completely undocumented, containing little information and no examples. On the ZFS side, the tools are feature complete and are properly documented, with examples, and with whole sets of training material on top of that.
If you needed to make a decision on which to use for a serious deployment, or even just for a smaller scale home NAS, right now if you objectively compare the two, the choice is quite clear, and it's not Btrfs. Based upon the development history of the two, it's unlikely that this will change much in the next few years. Remember also that ZFS development is very active, perhaps even moreso than Btrfs. But who knows, maybe by 2020 Btrfs will surpass it.
Re:BTRFS is the future by Bengie · 2015-12-22 01:23 · Score: 2

+9001 Funny! I needed that. BTRFS, the FS designed by devs for cool new features!. ZFS, the FS designed by sysadmins for sysadmins.
Re:Altogether now: "replicaton is not backup" by BitZtream · 2015-12-22 02:03 · Score: 2

Without some kind of incremental snapshot, with read-only privileges after the snapshot, straight replication is next to useless if someone does "rm -rf /". And it happens *all the time*.
So ... zfs covers that ... since it does exactly what you suggest.

Sure, if you can afford to buy 3 times as much disk
What? If you want mirroring or RAID like qualities, yes, you need to duplicate data, thats true of any mechanism like this... you do realize thats what things like NetApp do too ... right, just mirroring or raid?

and roughly 10 times as much network bandwidth as you ever really process with,
... this makes no sense? How does the network come into play here? You're just making random shit up?

ZFS is nice if you can afford one sys-admin/Terabyte of data to try to keep it up to date, but it's just not been stable.
The company I work at rolls over roughly 50tb of data PER DAY, several petabytes worth ... in ZFS ...
You'll have to pardon me if I doubt some random Anonymous Coward spewing clear ignorance has any idea what 'stable' is after making such stupid statements.

--
Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
Re:BTRFS is the future by rl117 · 2015-12-22 02:50 · Score: 4, Insightful

Are you for real AC, or just trolling?
Your Synology "reference" is a classic "appeal to authority", only it's a really bad choice of authority due to its complete lack of any technical detail or substance of any kind. That link is to a marketing page for a company which makes money selling hardware. It's just a few bullet points (snapshotting, checksumming in essence), without any discussion of the actual tradeoffs or comparison with other systems. It's worthless. It's only purpose is to tick a feature box to act as an incentive to purchase their systems; as for the actual performance and reliability of those features--that's the customer's problem. Caveat emptor.
I've done more than casual work and development with Btrfs. For example, from back when I was a Debian developer, here's the original inital support for Btrfs snapshotting in schroot. This lets you create virtual environments from Btrfs snapshots, as well as other types such as LVM and overlays. You can then plug this into other tools such as sbuild, and then build the whole of Debian using snapshotted clean build environments. Doing this, Btrfs fails hard around every 18 hours, going read-only. Why? Creating and deleting 18000 snapshots for 8 parallel builds quickly unbalances the filesystem, requiring a manual rebalance. You don't see that unfortunate detail in the Synology fluff page, do you?
You can also get snapshots and decent recovery (albeit without block-level checksums) from LVM and mdraid. In my experience, its recovery behaviour after real hardware failure is vastly more reliable than Btrfs. Simply put, it has always resynched the data without problem, while Btrfs has caused irrecoverable data loss, despite it theoretically being much better. LVM snapshots have very different tradeoffs as well. And on modern Linux with udev, we had to abandon using them due to races in udev/systemd making them randomly fail.
The point I'm making is that the reality of the chosen tradeoffs between performance, reliability and featureset of the different filesystems is a subtle one. You can't reduce it down to "Btrfs is better" or "ZFS is better". That's marketing. But I have spent over seven years pushing Btrfs to its limits, and have found it sorely lacking. It's unacceptable that it unbalances itself to the point of unusability. It's unacceptable that it has led to irrecoverable dataloss on several occasions. It's also unacceptable that in its eight years of existence, none of the developers could be bothered to write any decent documentation. The dataloss was down to bugs, some of which are fixed, but it does leave you in a position of lacking trust in it in the face of such problems. If you compare this with ZFS, while it's not fair to say it has been totally bug free, it has been almost bug free, and the number of dataloss incidents is small. I've yet to encounter any problems with ZFS myself, but I've encountered many serious issues with Btrfs.
Anyone who uses Btrfs or ZFS on a NAS system does so at their own risk after researching the various options and their tradeoffs. Just because a vendor decides to make and market a system using Btrfs does not make that system the best choice. It just means they thought they could make some profit from it.