ZFS Replication To the Cloud Is Finally Here and It's Fast (arstechnica.com)
New submitter kozubik writes: Jim Salter at Ars Technica provides a detailed, technical rundown of ZFS send and receive, and compares it to traditional remote syncing and backup tools such as rsync. He writes: "In mid-August, the first commercially available ZFS cloud replication target became available at rsync.net. Who cares, right? As the service itself states, If you're not sure what this means, our product is Not For You. ... after 15 years of daily use, I knew exactly what rsync's weaknesses were, and I targeted them ruthlessly."
rsync synchronises files. ZFS synchronises a file system. Of course it is better to work that way because you can transfer just the changed components of a file. Moving a file just changes a pointer, so send the pointer. That sort of thing.
http://michaelsmith.id.au
I was a little unexcited by (although interested in) the article, even by the general speedups until I got to the part about VM replication. This really makes an enormous difference.
ZFS licensing has kept this as a grey area for me, so I I've largely kept away from deployment (save for an emergency FreeNAS box I needed in a hurry), but I'd clearly benefit from looking here again. Thanks for the reminder.
Oh, I also appreciate the rsync.net advertisement. Good guys, good service ;-)
Oh arse
Who cares, right? As the service itself states, If you're not sure what this means, our product is Not For You.
Ah, there's that welcoming open-source community spirit.
systemd is Roko's Basilisk.
It will make you kill your wife
Reading this article, it seems that this "ZFS replication" is very similar to rsync, with one straightforward addition:
Rsync works on an individual file level. It knows how to synchronized each modified file separately, and does this very efficiently. But if a file was renamed, without any further changes, it doesn't notice this fact, and instead notices the new file and sends it in its entirety. "ZFS replication", on the other hand, works on the filesystem level so it knows about renamed files and can send just the "rename" event instead of the entire content of the file.
So if rsync ran through all the files to try to recognize renamed files (e.g., by file sizes and dates, confirming with a hash), it could basically do the same thing. This wouldn't catch the event of renaming *and also* modifying the same file, but this is rarer than simple movements of files and directories. The benefit would have been that this would work on *any* filesystem, not just of ZFS. Since 99.9% of the users out there do not use ZFS, it makes sense to have this feature in rsync, not ZFS.
For those who already understand rsync and zfs the article adds nothing new that is of value. 1/3 of the article is telling you what rsync is, which you can fill with lorem ipsum and still not lowering the next-to-none quality of the article. We already fucking know what rsync is. It's in the man pages for, like 10+ years. And why do you need a Jedi picture just for that?
Then the useless benchmark, taking another 1/3. No repeatable experiments. No statistics. Only one-shot timings. And the worst things is that the result is completely expected. The first pseudo-benchmark is network-limited so the results would be the same. The second is completely expected. The incremental payload is so small that any difference is just overhead, and you expect the rsync with overhead NOT to perform worse than native FS? The third is pointless. It's apples and oranges. Anyone who knows the difference between syncing and replicating already know about this. We don't need a whole article telling us that.
And finally, the obligatory shitty script dump (nobody cares about the verbatim copy of your perl script, kiddie), the apple-to-orange pros-cons comparison, clueless stock picture taking half screen space, and the non-conclusion.
The whole thing can be condensed into a 500-word tech briefing and you wrote that?
Sheesh.
BTRFS is the future
ZFS is just here today because. stability, but once BTFS userland side gets stable (on disk structures are stable currently) kiss ZFS bye bye.
Synology is already moving to BTRFS on DSM 6.0 in 2016.
Jim Salter writes some great pieces on file systems for Ars Technica.
At the linked article are Related Links. Of particular note is "Atomic Cows and Bit Rot" -- read that if you're interested in modern file systems.
Without some kind of incremental snapshot, with read-only privileges after the snapshot, straight replication is next to useless if someone does "rm -rf /". And it happens *all the time*. The cost for a few Terabytes of critical business data is a day or so reading up on the ancient perl script "rsnapshot" and some remote disk, or a NetApp with local snapshotting. Sure, if you can afford to buy 3 times as much disk as you need locally and roughly 10 times as much network bandwidth as you ever really process with, ZFS is nice if you can afford one sys-admin/Terabyte of data to try to keep it up to date, but it's just not been stable.
there is one further and pretty massive difference.. if you sync a big file with only a few bits changed rsync needs to read the file on both ends to find out where differences are (then only transmit the diff).. this becomes a massive issue if you have big files with small changes (like.. vm images).
zfs sync simply syncs changed blocks near real time so no need to read the whole file again
So it is "better" than rsync, but their website is rsync.net? Methinks they doth protest too much. I'll stick with Amazon S3 and rsync. It is cheaper and faster since they have many many many more datacenters throughout the world.
Anyone else getting tired of is term? All it means is "someone else's computer". All you're doing is renting server space and replicating your data there. There's nothing special about it.
Actually no - it's not better unless you are already using ZFS in which case you probably already know about the feature.
Sorry, Oracle owns ZFS. I wouldn't touch it with a 100 foot pole even if it came with a winning lottery ticket. Oracle is only a step better than SCO.
What if I told you that real "sys admins" have been able to keep billions of files over 100's of TB in sync with an RPO of 15 minutes with out your precious, slow and crappy rsync for the past 10 years?
file system snapshot replication is old news kids.
ZFS is cute and all, but in typical Oracle fashion, playing catch up.
If btrfs has so many issues, I wonder why Docker doesn't have a deployment on Illumos. or SmartOS.
I would think that Docker enthusiasm would be damped by a beta filesystem and (the lack of) verifiable security in package content.
Don't let the licensing FUD scare you. Linus has publicly stated that licensing in a case that's a very near equivalent to ZFS' licensing is fine.
The anticipated problem with the license has always been on the Linux side. The license ZFS is released under doesn't in any way prohibit the ZFS code from being used in other places with other licenses (like the *BSD's). There has never been a concern that using ZFS with Linux violates the ZFS license (and thus could bring Oracle's well-fed lawyers down upon you). The contention has been that combing CDDL code with GPL-2 in a derivative work violates the GPL and thus places you in trouble with Linux's license. The core problem is that CDDL places additional restrictions on binary code resulting from derivative works, which GPL-2 prohibits.
Linus has weighed in specifically on the AFS filesystem module here: http://yarchive.net/comp/linux...
Given that ZFS was originally written for Solaris and the core code works essentially unmodified (with a porting layer in some cases) on Solaris, *BSD, Linux, possibly other systems, there are lots of indications that it should fall into the same category as the AFS code: The ZFS modules are not derivative works of Linux and thus may be used with Linux even though their license prevents them from being incorporated into Linux.
We've had a very significant discount for HN readers for years and we'd be happy to extend that to /. readers. Just email and ask.
Really happy to be here - I am not sure why I am labeled as "new submitter" since I have been a slashdot user for ... 15 years ?
Happy to answer any questions about our service here as well.
$60/month, special pricing, for 1tb....
no thanks.
If I'm reading this right, ZFS sync opens up one other huge, huge possibility. I had this idea nearly 15 years ago (shortly after Napster), but didn't have the technical expertise to implement it: A distributed redundant filesystem.
ZFS doesn't think in terms of files. It thinks in terms of blocks, and in a redundant z-volume (similar to a RAID array) it distributes those blocks over multiple virtual devices (vdevs) - you can think of them as disks, but they don't have to be. These vdevs can be a disk, a partition, a file on a disk, or more crucially a SAN or iSCSI - disks which aren't connected directly to the computer but are accessed over a network. Til now, those last two have been disks on the same premise, just not in the saem computer. ZFS sync could open it up to any networked vdev anywhere in the world.
So what's the big deal? The big deal is that in a redundant filesystem, you cannot reconstruct the original data from any single vdev. If you have 4 drives in RAID 5, no single drive has a complete file. You need all of the data off of at least 3 drives to reconstruct a file. The same goes for ZFS - if you're using 2-drive redundancy and you have 6 vdevs, you need the data off of at least 4 vdevs to reconstruct the file.
Now what if each of those vdevs were located in different places around the world? One could be Google Drive, another Dropbox, another Microsoft OneDrive, etc. Your data could be on the cloud, and it would still be accessible even if one service went down or even shut down completely. ZFS would just treat it like a drive failure. It would re-verify and recover after the service came back online. Or you could simply replace it with a vdev on a different cloud service. (ZFS redundancy is on a block level, so a block failure doesn't mean it drops the entire vdev from the array like RAID does with a disk which generates an error. It simply marks the block as bad and tries to reconstruct it from redundant info on other vdevs. Other blocks stored on that vdev are assumed to still be good, until you access it and the checksum says it's bad.)
Also, no single cloud service provider would have a complete copy of your data. Hackers could manage to break into a service and get all your data stored at that service. But unless they managed to get data from (n-r) services (n = number of cloud vdevs you're using, r = redundancy level), they couldn't reconstruct your data. More to the point, if said service notified you of the breach in a timely manner, you could respond by creating new vdevs with different encryption, copying your data from the old vdevs to the new, then erasing the old vdevs. Unless the hackers managed to simultaneously hack (n-r) cloud services, your data cannot be compromised. (Or if you're on the dark side, Hollywood could get the feds to raid a cloud storage service and get all your data there, but unless they did it simultaneously with (n-r) services, they wouldn't be able to see that you have copies of pirated movies stored on those services.)
I've been trying to set up something similar between my sister's, my parents', and my house, with our NASes backing up each other so we won't lose our data if one house burns down. But it's been a PITA with rsync. Because rsync thinks in terms of files, each house has to have a complete copy of the other houses' data. If I were able to do it with ZFS vdevs, it would represent a 50% space savings. More if I had more homes to work with.
HAMMER2 from the DragonflyBSD project is going to offer some pretty nice gains when it becomes production ready. Multi-master replication and clustering and a BSD-license - a really nice combination.
i think.panzura is allready doing thiz or something like it.
Not sure what you mean. Jails have been around for a long time, but LXC/LXD containers have almost identical functionality. [...] Only difference I can see really is that LXC doesn't support nested containers...
Key one that's missing IMHO: security.
You're able to run as-root / Set-UID binaries with-in them? Nope. LXC emulates this by mapping UID-0 in the container to UID-x on the host via namespaces. BSD jails (nor Solaris zones) have such a solution and root-is-root internally.
From one of the maintainers of Docker (as of June 2014):
Please remember that at this time, we don't claim Docker out-of-the-box is suitable for containing untrusted programs with root privileges.
* https://news.ycombinator.com/item?id=7909622
IIRC, there has been only one advisory about escaping out of a jail (and it was because of a devfsd bug, not jails itself), and no advisories about breaking out of Solaris zones. (Heck there have been issues about escaping Xen and QEMU-KVM.)
LXC is a useful technology, but if security is a concern, I would avoid it.
Yeah, he writes okay pieces, but it kind of annoys me when he throws up blanket advice and then practically trips over himself extolling the opposite.
ZFS: You should use mirror vdevs, not RAIDZ
Guess what? The entire rsync.net service is built on top of RAID-Z3, if I read their promotional portal correctly.
One use case I can see for this is using ZFS to back up Postgres databases. I'm not the only person to think this might be a good idea. A while back, I listened to this talk, which I really enjoyed:
Keith Paskett: PostgreSQL on ZFS
On hard experience, he's particularly wary about the "drop table" oops disaster scenario.
Keith Paskett bio
* infrared radiometric calibration chambers Space Dynamics Laboratory
* helped develop Utah State University's Climate data server
* National Climate Data Center validated climate data
* all stored in PostgreSQL of course
ZFS: You should use mirror vdevs, not RAIDZ
Guess what? The entire rsync.net service is built on top of RAID-Z3, if I read their promotional portal correctly.
That sounds bad, but I think it actually makes sense. From the article you linked to, slightly edited:
* don’t be greedy. 50% storage efficiency is plenty
[because]
* a pool of mirrors is easier to manage, maintain, live with, and upgrade than a RAIDZ stripe.
From that, it actually makes sense that when the major cost is storage, it might be worth it to trade away the "easiness" of mirrors for the lower cost of RAIDZ.