ZFS Replication To the Cloud Is Finally Here and It's Fast (arstechnica.com)

rsync and zfs do different things by MichaelSmith · 2015-12-21 21:55 · Score: 5, Informative

rsync synchronises files. ZFS synchronises a file system. Of course it is better to work that way because you can transfer just the changed components of a file. Moving a file just changes a pointer, so send the pointer. That sort of thing.

--
http://michaelsmith.id.au

Re:rsync and zfs do different things by x_t0ken_407 · 2015-12-22 06:57 · Score: 1

It's more than that - ZFS is basically taking fast snapshots and syncing just the deltas between the latest snapshot and the previous snapshot, which are blocks. Files and pointers don't matter - it's syncing individual changed blocks. You change one letter in a file, it's not syncing the whole file - just the changed block. It's substantially more efficient.
Exactly, and it's why ZFS' transfer speed is so much faster and does not go up with the size of the file (as rsync does), as shown in the article.
Re:rsync and zfs do different things by Cramer · 2015-12-22 10:48 · Score: 1

rsync does the same thing (block level transfers). ZFS wins this race because it is the filesystem and keeps track of which blocks are changing. rsync has to read every block, compute a checksum, and communicate that checksum to determine which block(s) need to be transfered. That's an expensive process, and thus why rsync defaults to "whole-file" on local storage. (you should disable that on an SSD.)
Re: rsync and zfs do different things by cthulhu11 · 2015-12-23 15:06 · Score: 1

I found zfs send to be bursty -- periods of little or no output interspersed with blasts saturating the channel. I ended up running it through mbuffer compiled with a larger buffer, which smoothed out the traffic nicely.

VM Replication by tomknight · 2015-12-21 21:58 · Score: 3, Interesting

I was a little unexcited by (although interested in) the article, even by the general speedups until I got to the part about VM replication. This really makes an enormous difference.

ZFS licensing has kept this as a grey area for me, so I I've largely kept away from deployment (save for an emergency FreeNAS box I needed in a hurry), but I'd clearly benefit from looking here again. Thanks for the reminder.

Oh, I also appreciate the rsync.net advertisement. Good guys, good service ;-)

--
Oh arse

Re:VM Replication by Lennie · 2015-12-21 22:44 · Score: 2

The article did feel like an advertisement.
They offer a VM with lots of a disk space, is that really that special ?
I know of at least one that offers something similar:
https://www.vultr.com/pricing/...
I guess not at the same scale and with a bandwidth limit.
What I think is kind of funny is how people are surprised that ZFS works well for VM-images.
rsync is meant/optimized for transfering files, not blocks.
ZFS is meant for transfering filesystem blocks, VM-images are blocks too.
So ZFS works better than rsync for that. That isn't so surprising.
Anyway the whole VM thing has been a big distraction, containers/zones were already in wide spread use before we VMs were in wide spread use.
I'm glad containers are getting more attention now. Partly because of things like storage. Who wants to deal with VM-images if you can have files ?

--
New things are always on the horizon
Re:VM Replication by rl117 · 2015-12-21 23:33 · Score: 1

There is no grey area with respect to the licensing. It's CDDL, a free software licence. It's 100% Free.
It might be incompatible with the GPL, but that's a non-issue. The userland tools are fine under this licence. The kernel modules are fine under this licence. Now, it means that the kernel modules aren't going to appear in a kernel release anytime soon, but that in no way makes for any legal problems in using them as loadable modules, today. It works fine from a technical point of view, and it's also fine from a legal point of view.
Re:VM Replication by Bengie · 2015-12-22 01:12 · Score: 2

Depends on what you're calling "containers". BSD Jails have been around for a long time, but what Linux calls "containers" are crappy attempts to containerize. The Linux community has this unhealthy "not invented here" syndrome that results in a lot of square wheels.
Re:VM Replication by tbuddy · 2015-12-22 03:41 · Score: 1

A little bit. The author as stated in the article is also the author of the sync tool, Syncoid and does consulting. He doesn't really have a horse in the race for the company he's reviewing and his tools are looking to support Btrfs, so he's not necessarily married too ZFS or the solution the company the review is mostly about. Actually pretty nice article to read.

From some of the benchmarks in the article it didn't seem like rsync had any strength over syncoid, other than his tool requiring ZFS on both ends while rsync being more flexible.
Re:VM Replication by Rutulian · 2015-12-22 04:02 · Score: 2

but what Linux calls "containers" are crappy attempts to containerize.

Not sure what you mean. Jails have been around for a long time, but LXC/LXD containers have almost identical functionality.
container templates...check
filesystem snapshot integration (ZFS, btrfs) with cloning operations...check
resource limits...check
unprivileged containers...check
network isolation...more flexible under LXC than Jails, in my opinion
bind mounts in containers...check
nice management utilities...check
live migration...in development
Only difference I can see really is that LXC doesn't support nested containers...
Re:VM Replication by Cyberax · 2015-12-22 06:25 · Score: 1

Only difference I can see really is that LXC doesn't support nested containers...
It most certainly does. Linux can nest user namespaces to almost any depth.
Re:VM Replication by Bengie · 2015-12-23 08:36 · Score: 1

The difference is BSD Jails are entirely separate environments with their own unshared kernel datastructures, and the jail communicates with the host via an API. Linux namespaces is just metadata added to shared environments. Not only is it reduced isolation, allowing for some annoying "leaking", but is more bug prone from increased complexity, and the shared datastructures are naturally more prone to major issues then they arise.

Security can't be bolted on after the fact, it must be baked into the design.
Re:VM Replication by Rutulian · 2015-12-23 09:51 · Score: 1

The difference is BSD Jails are entirely separate environments with their own unshared kernel datastructures, and the jail communicates with the host via an API. Linux namespaces is just metadata added to shared environments.
I'm sorry, but this notion is completely wrong. A BSD Jail is a forked process (the "jail process"), which calls the "jail" kernel system call and then executes a chroot. The jail syscall serves to attach the "prison" data structure to the "proc" data structure of the jail process, allowing the kernel to identify the process as "jailed" and treat it accordingly. The isolation of the environments is dependent entirely on the kernel recognizing that the process is jailed and putting the appropriate restrictions on it.
https://www.freebsd.org/doc/en...
An LXC container is a forked process with a chroot that is assigned a set of namespaces, which are then restricted by kernel cgroups and kernel capabilities. The end result is that the forked process is only able to communicate with other processes within its namespace and is only allowed to access resources that cgroups allow that namespace to access. The only way for a containerized process to communicate with the host is through a vnet bridge device, which is similar to way it works with Jails.
https://libvirt.org/drvlxc.htm...
So in other words, the specific implementations details are different, but the result is nearly identical. A forked process executes a chroot and then is assigned a special status by the kernel which determines which resources it can access and which processes it can communicate with.
Re:VM Replication by Bengie · 2015-12-24 07:22 · Score: 1

BSD Jails do not use anything like chroot. "chroot" is being used as a verb that described the intention, but not the implementation.

This is how the FreeBSD kernel devs describe BSD Jails. Each jail get's it's own kernel network stack, kernel memory allocator, and almost every other kernel datastructure. They said this is nearly identical to paravirtualization. Breaking out of a jail requires a kernel flaw in both a system call and the paravirtualization layer.

Think KVM+QEMU, with most of the benefit and virtually no overhead from a performance standpoint. Of course you can't run just any kernel, only the host kernel, but that's why it's so low overhead. If you do need another kernel, they have some basic Linux kernel emulation that can wrap BSD system calls in a jail to mimic Linux. Not perfect, but 80/20. If you need a full on VM, you can use the 1,300 LOC large(this is small. fewer lines, fewer bugs) hybrid type 1+2 VM, bhyve.

A cool thing you can do with jails+ZFS is create a snapshot that you clone for your compilation. Some company that does a lot of compiling, integration tests, and other things, has a basic ZFS snapshot that they clone, then spin up a jail, clone their target source code, feed that ZFS clone into the jail, then compile, test, etc. Once everything is done, they have the jail write out the binaries to a shared location, shutdown the jail, and delete the ZFS volume. During a large compilation, they are creating and destroying about 10,000 jails+snapshots per second, and at 95% the speed of compiling and testing directly on the host, but a lot easier to manage.

The way PC-BSD 11 upgrades your system is it snapshots your boot volume, loads a jail, boots PC-BSD within the jail, runs the upgrade, then points your next reboot at the upgraded volume. And yes, you can not only snapshot your entire FreeBSD system and boot it in a jail, but you can actually boot your host system from a jail's volume. This also means you can keep around old snapshots of your host OS, and "boot them" in a jail. This doesn't entirely apply to kernel updates. Because jails are just copies of the host kernel, you only have one kernel running at a time. But ZFS is bootable, so your snapshots do include the old kernels. If you need to boot an old kernel snapshot, you can just as easily boot that volume in bhyve.
Re:VM Replication by Rutulian · 2015-12-25 05:39 · Score: 1

This is how the FreeBSD kernel devs describe BSD Jails. Each jail get's it's own kernel network stack, kernel memory allocator, and almost every other kernel datastructure.
What you are describing is VPS (Virtual Private System), not Jails. VPS is the successor to Jails, written to address some of the shortcomings of Jails and make them more useful in situations where you want true virtual environments, rather than just the extra security that Jails has to offer. Incidentally, the mechanisms used to implement VPS in FreeBSD are nearly identical to the mechanisms for implementing containers on linux. Here is the relevant description from the whitepaper (http://2010.eurobsdcon.org/fileadmin/fe_user/klaus/37R5uB.pdf):

3.4 Multiplexing global variables
A FreeBSD kernel without VPS maintains global variables like the process table, the hostname, number of currently existing processes, and much more.
In a VPS enabled kernel, global variables are replaced by variables private to a VPS instance. Therefore even if no VPS instance is explicitly created, the
system knows the instance “vps0”, which is the “main system”. This instance is created very early at kernel boot and has all privileges. VPS instances can be created in a hierarchical way, allowing one VPS instance to manage its child instances and pass on part of their resource quotas. Each “struct ucred” keeps a pointer to the real VPS instance, and each “struct thread” keeps the pointer to the effective VPS instance. A “struct ucred” contains user credentials and is referenced by threads, processes, sockets, some devices as well as some other resources.
Sounds familiar? That is basically how linux namespaces work. The primary difference is that on linux you have several independent namespaces (cpu, mem, ipc, net, etc) that must be used together to create a container, whereas with VPS they are integrated more-or-less. On linux you have the flexibility to use namespaces for other purposes besides containers, on BSD you have VPS and only VPS.

BSD Jails do not use anything like chroot. "chroot" is being used as a verb that described the intention, but not the implementation.
For VPS, you are correct. For Jails it is a chroot, albeit a hardened version of the chroot system call, not chroot the userspace program. Read about it here in the whitepaper written by the author of Jails,
http://phk.freebsd.dk/pubs/san...

They said this is nearly identical to paravirtualization.
This is not true at all. For Jails there are two principle isolation features. Once a process is jailed, the "struct proc" has a copy of the "prison" data structure and a reference counter. Every child process that it creates references this data structure in its own "struct proc" and increments the reference counter. In this way, the processes that belong to a jail are tracked by their references to the "prison" data structure owned by the root jailed process. Various kernel systems are then altered to block messages between processes not in the same jail, to limit process listing to the tree described by references to the root railed process (via the sysctl interface, for example), to limit pty access to processes in the same jail, etc. Among the things recorded in the "prison" data structure are the ip address and hostname, and the kernel blocks attempts to bind an ip address or hostname that does not fall in this specified range.
Despite these mechanisms to isolate processes, there is still one global process table and one network stack. The partitioning is effected by hiding parts of the process table and making every attempt to block communication outside of the allowed process list, but it is a fairly blunt instrument. For example, IPC is completely blocked by default in a Jail because this is th
Re:VM Replication by schitso · 2015-12-31 02:21 · Score: 1

This was an extremely informative conversation to read. Kudos to you both.

Charming by wonkey_monkey · 2015-12-21 21:58 · Score: 2, Insightful

Who cares, right? As the service itself states, If you're not sure what this means, our product is Not For You.

Ah, there's that welcoming open-source community spirit.

--
systemd is Roko's Basilisk.

Re:Charming by greenfruitsalad · 2015-12-21 23:05 · Score: 3, Informative

there are things in this world that simply aren't meant for participation award winners. so go get offended somewhere else.
if somebody doesn't know what ZFS replication is, their product clearly isn't meant for them. why bother with explanation to a visitor that has no use for the product/service?
the attitude of these ZFS people is still quite welcoming compared to some connectivity providers i've dealt with. e.g. bogons.net will just politely tell you to f*ck off if you don't fully understand what you're purchasing from them (dwdm/cwdm rings).
Re: Charming by greenfruitsalad · 2015-12-21 23:50 · Score: 2, Informative

their howtos hold newbies' hands sufficiently. they simply don't provide a free "Oracle ZFS Storage Appliance Administration course", which is what some people seem to expect. it seems i am discussing this with people who haven't even visited their website, so i'll stop here.

Rsync could have done this too! by urdak · 2015-12-21 22:01 · Score: 4, Informative

Reading this article, it seems that this "ZFS replication" is very similar to rsync, with one straightforward addition:

Rsync works on an individual file level. It knows how to synchronized each modified file separately, and does this very efficiently. But if a file was renamed, without any further changes, it doesn't notice this fact, and instead notices the new file and sends it in its entirety. "ZFS replication", on the other hand, works on the filesystem level so it knows about renamed files and can send just the "rename" event instead of the entire content of the file.

So if rsync ran through all the files to try to recognize renamed files (e.g., by file sizes and dates, confirming with a hash), it could basically do the same thing. This wouldn't catch the event of renaming *and also* modifying the same file, but this is rarer than simple movements of files and directories. The benefit would have been that this would work on *any* filesystem, not just of ZFS. Since 99.9% of the users out there do not use ZFS, it makes sense to have this feature in rsync, not ZFS.

Re:Rsync could have done this too! by brambus · 2015-12-21 22:16 · Score: 5, Insightful

The crucial difference is ZFS send is unidirectional and as such is not affected by link latency. rsync needs to go back-and-forth, comparing notes with the other end all the time. ZFS send is also a lot faster and more efficient, eliminating entire large portions of the filesystem tree structure that haven't changed without having to read them in. This is not to say that rsync's authors were any less competent coders. ZFS simply has more information available about the filesystem than rsync, so it can make smarter decisions.
Re:Rsync could have done this too! by Anonymous Coward · 2015-12-21 22:19 · Score: 1

Not exactly.
rsync will always have to go through the files and check. Trying to identify stuff like renames will obviously make a difference, but as it's only really going to have any sizeable impact when you happen to have lots of renames, but not actual data changes, it's probably not even worth the effort of implementing it.
ZFS send/recv works at a very low level using the fundamental infrastructure in ZFS that makes snapshots work. When you send an incremental ZFS snapshot it doesn't have to check anything, it immediately starts streaming changes since the last snapshot, as fast as the system/network will let it.
I love rsync, and yes a major benefit is that it works on any filesystem but it's just not possible for it to get anywhere near the performance of something like ZFS send/recv.
Re:Rsync could have done this too! by geggo98 · 2015-12-21 22:20 · Score: 2

In principle true, but with one exception: If you already use ZFS for other reasons (e.g. checksums in the file system or transparent compression), it's really nice that you can make backups on the filesystem level with rsync like performance. The backup on the filesystem level keeps all file system specific features intact (e.g. the checksums and the compression). So you can have really fast backups and you can be sure, that when you restore the backup, the filesystem will look exactly as it looks now. So you can use rsync when you are you want to backup the content of the files or ZFS snapshots when you want to backup the layout of the filesystem (including the files' content of course).
Re:Rsync could have done this too! by Maow · 2015-12-21 22:54 · Score: 2

I was wondering what this offers over a (theoretical?) inotify+rsync app.
In the comments at the linked-to Ars article, Jim discusses just this approach.
Basically, and from memory, he determined that it would just be too much work to re-implement something that already works solidly (ZFS) and comes with a huge amount of other features out of the box.
Re:Rsync could have done this too! by drinkypoo · 2015-12-21 23:31 · Score: 2

So if rsync ran through all the files to try to recognize renamed files (e.g., by file sizes and dates, confirming with a hash), it could basically do the same thing.
As a sibling comment points out, rsync does have a mode which handles this. As they don't point out, it is horrendously costly. Making this the default would be a pure idiot move. ZFS has metadata that permits detecting these sort of files, so it is possible to do it cheaply with ZFS.
What is really wanted IMO is for rsync to detect this stuff and use it when ZFS is present.

--
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
Re:Rsync could have done this too! by grumbel · 2015-12-21 23:51 · Score: 2

The biggest difference is that ZFS has full knowledge of the state of the file system, rsync on the other side doesn't, it's stateless, it has to start from zero each time and regather the information on each and every run on both sides, which is a really slow and potentially error prone process (i.e. when files change while rsync runs). ZFS knows what's going on in the filesystem and its snapshots the filesystem at a single point in time, so it thus it can be be far quicker and won't produce inconsistencies in the transmitted data. Tracking renames would speed some things up a little, but to match ZFS it would need a way to get the information about what changed in the filesystem from the filesystem itself and at the moment such functionality isn't available.
Re:Rsync could have done this too! by urdak · 2015-12-22 00:50 · Score: 1

The crucial difference is ZFS send is unidirectional and as such is not affected by link latency. rsync needs to go back-and-forth, comparing notes with the other end all the time.

But this is *not* what the article appears to be measuring. He measured that the time to synchronize a changes were nearly identical in rsync and "ZFS replication" - except when it comes to renames.
Re:Rsync could have done this too! by urdak · 2015-12-22 00:52 · Score: 2

Not exactly.
rsync will always have to go through the files and check. Trying to identify stuff like renames will obviously make a difference, but as it's only really going to have any sizeable impact when you happen to have lots of renames, but not actual data changes, it's probably not even worth the effort of implementing it.
The rename issue is actually *very* important. It's not likely that you'll have a lot of independent renames, but something very likely is that you rename one directory containing a lot of files - and at that point rsync will send the entire content of that directory again. I actually found myself in the past stopping myself from renaming a directory, just because I knew this will incur a huge slowdown next time I do a backup (using rsync).
Re:Rsync could have done this too! by Bengie · 2015-12-22 01:16 · Score: 1

The difference between rsync and ZFS is O(N) and O(1). That is the worst case, but ZFS can instantly find the difference between datasets of any size, while rsync has to scan them first. Try rsyncing petabytes of files where many files are constantly being touched, but few changes being made.
Re:Rsync could have done this too! by BitZtream · 2015-12-22 01:58 · Score: 1

ZFS replication is for synchronizing file system snapshots. rsync is for syncing some files.
Entirely different purposes even if they seem the same.
ZFS encapsulates the entire storage channel. It is your volume manager all the way to your file system. It knows of every single change that occurs, when and where it occurs and what it changed. Sending a ZFS snapshot gets not only the snapshot being sent, but every one in between. ZFS does deduplication, compression, checksumming, and the snapshots stores every file system attribute that exists, be it simple permissions or full ACLs or other extended attributes.
ZFS replication is a disaster recovery feature.
Rsync is a way to keep a couple directories synced up, and in doing so requires reading the data in question multiple times for every time you want to sync unless you cheat and only look at file sizes and times, which will miss plenty of things.
Rsync hashs files for compares and then copies the file.
ZFS replication makes block for block copies of the file system data based on changes that it KNOWS existed. It doesn't have to find changes, it KNOWS the changes.
You can make a script that will wrap rsync in some of these features, but you will never get anywhere near ZFS replication.
ZFS snapshot send a 25TB drive with millions of images and files and only a handful of tiny changes ... it'll occur in minutes.
Rsync will take weeks.
You can not 'build these features' into rsync for all sorts of reasons, some of which involve around essentially recreating something like the work ZFS does ... but doing it everytime you run rsync, which would be painfully slow. rsync can not accomplish what ZFS does because it does not know the information required to do so and it can not know the information required to do so unless it happens to be running on top of ZFS where it can query for that information and ... at that point, its painfully inefficient compared to zfs send.
If you rm -rf /mntpoint on the source dir:
with rsync: On next sync, if you have delete enabled, you've lost all your files
with zfs: the next snapshot transferred will remove the files ... of course, they'll still be on both the source and destination hosts in any previous snapshots so rollbacks are trivial.

--
Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
Re:Rsync could have done this too! by brambus · 2015-12-22 02:00 · Score: 2

If you read on a bit in the article, you'll come across the example of daily syncing of VM images across to a backup node. While ZFS send is done in less than an hour, rsync would take north of 7 hours just to read in the local state of the VM image, much less figure out what has changed and send the diffs. This is based entirely on ZFS send's unidirectionality. The critical difference is that rsync needs to trawl the entire local dataset state completely and compare notes with the other box (which also needs to read it all in) in order to figure out what's changed. ZFS send doesn't need to do that.
Re:Rsync could have done this too! by MightyYar · 2015-12-22 02:13 · Score: 1

Renames and changes to large files (VM images were the author's example).

--
W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.
Re:Rsync could have done this too! by fluffernutter · 2015-12-22 02:17 · Score: 1

I'm pretty sure there are people using rsync successfully for more than "a couple directories".

--
Laws are rules for the court, but merely a bottom bar to hit for life. Think beyond laws in your actions always.
Re: Rsync could have done this too! by guruevi · 2015-12-22 03:08 · Score: 1

Another problem is that rsync has to scan the entire file system, calculate hashes and transfer them and then do the same on the other side before it can transfer the difference.
If you have millions of files and directories that can take significant amount of time. I used to have rsync take a weekend to backup. With ZFS I can do hourly backups.

--
Custom electronics and digital signage for your business: www.evcircuits.com
Re:Rsync could have done this too! by brambus · 2015-12-22 03:30 · Score: 1

ZFS has metadata that permits detecting these sort of files
Side note for your entertainment in case it interests you, the way ZFS actually handles the rename case has nothing to do with trying to follow file name changes. In fact, in order to handle a rename, we don't need to look at the file being renamed at all. The trick is in the fact that directories are files too (albeit special ones) with a defined hash-table structure. ZFS send simply picks up the changes to the respective directories as if they were regular files and transfers those. The changed blocks then contain the updated name-to-inode# mappings, which is what a rename really is. From ZFS send's point of view, a filesystem is just a flat collection of objects and all it does is transfer the changes to these objects that happened between two transaction groups.
Re:Rsync could have done this too! by rl117 · 2015-12-22 04:15 · Score: 1

They definitely are. But it doesn't scale well. The time taken to scan the files and their contents on the source and destination system becomes overwhelming. The largest I've taken it to is a few terabytes, consisting of many thousands of directories each containing thousands of files (scientific imaging data). It ends up taking hours, where with ZFS it would take a few seconds. It also thrashes the discs on both systems as it scans everything, and uses a lot of memory. ZFS does none of these things--the send/recv is a "simple" streaming operation.
That said, with tools based upon rsync, like unison, it becomes possible to do two-way synchronisation which is pretty powerful. ZFS send/recv only works one-way. But again, the scan time with unison becomes prohibitive.
Re:Rsync could have done this too! by greenfruitsalad · 2015-12-22 06:04 · Score: 1

the scopes of what "zfs send" and "rsync" do are so profoundly different, it's almost silly to compare them. they're at completely different layers of storage stack. when i sync my local filesystem with a remote site (every hour), i sync snapshots, clones, (sub)filesystems while things are mounted and heavily in use. there's also compression and deduplication to consider.
the rsync feature you suggested isn't possible without a complete zfs rewrite or another layer of abstraction. too costly in either case.
Re:Rsync could have done this too! by DRJlaw · 2015-12-22 06:22 · Score: 2

But this is *not* what the article appears to be measuring. He measured that the time to synchronize a changes were nearly identical in rsync and "ZFS replication" - except when it comes to renames.
Yet this is what the article says. Does he really have to measure read time to the millisecond instead of providing an estimate? How fast can your disk system read off 2TB of information, anyway?
"Virtualization keeps getting more and more prevalent, and VMs mean gigantic single files. rsync has a lot of trouble with these. The tool can save you network bandwidth when synchronizing a huge file with only a few changes, but it can't save you disk bandwidth, since rsync needs to read through and tokenize the entire file on both ends before it can even begin moving data across the wire. This was enough to be painful, even on our little 8GB test file. On a two terabyte VM image, it turns into a complete non-starter. I can (and do!) sync a two terabyte VM image daily (across a 5mbps Internet connection) usually in well under an hour. Rsync would need about seven hours just to tokenize those files before it even began actually synchronizing them... and it would render the entire system practically unusable while it did, since it would be greedily reading from the disks at maximum speed in order to do so." (emphasis mine)
Re:Rsync could have done this too! by mcrbids · 2015-12-22 06:35 · Score: 1

Well, sort of....
We switched from rsync to ZFS replication for our production environments and the difference in performance is rather extreme. (and why we made this change)
Medium sized file system, 12 TB and a few hundred million files. Doing a backup with rsync took days, and it was all just tied up in IOPs, even if the number of files changed was rather small. At this scale, it takes more than 24 hours just to get a listing of files.
Switching to ZFS with nighly snapshots and replication dropped backup times from days down to minutes. Add other features like clones, compression, hot error checking (scrub), hot swapping and RAIDZ, and it becomes pretty obvious pretty quickly that if you're serious about data you should seriously consider ZFS.
ZFS on Linux is pretty easy to install and it's been rock solid stable of our use in a 24x7 heavy use environment.

--
I have no problem with your religion until you decide it's reason to deprive others of the truth.
Re:Rsync could have done this too! by MachineShedFred · 2015-12-22 06:44 · Score: 1

In addition, when it comes to VM hosting in the filesystem, ZFS deduplication can offer a significant space savings by deduping all the common files in the VM images (operating system files).
If you are hosting Windows VMs, this effectively nullifies many gigabytes of storage bloat. This is, of course, a feature of ZFS, and has nothing to do with snapshotting other than the fact that your snapshots will be smaller.

--
Slashdot still doesnâ(TM)t support Unicode after it was added to the HTML standard in 1997.
Re:Rsync could have done this too! by blackpaw · 2015-12-22 09:27 · Score: 1

Not quite - zfs needs to contact the destination zfs fs to compare with the last snapshot, but that is a very quick process. Once done zfs already knows whats blocks have changed since the last snapshot, whereas rsync has to scan the contents of each file at *both* ends which is where all the time comes in.
Re:Rsync could have done this too! by blackpaw · 2015-12-22 09:31 · Score: 1

deduplication takes an insane amount of RAM and is really only useful for static rarely written datasets, its strongly recommended against for VM images.
OTOH enabling lz4 compression is recommended - cpu/ram usage is minimal and the compression levels can be quite impressive, plus it can actually improve disk i/o as less data is read/written from disk. I have many VM's with compression enabled, compression usually reduces the image by about 30%
Re:Rsync could have done this too! by brambus · 2015-12-22 10:53 · Score: 1

Not quite zfs needs to contact the destination zfs fs to compare with the last snapshot

Ehm, no, sorry. No communication with the destination machine is required while generating an incremental send stream. How can I claim this? Well besides being quite intimate with the ZFS source base (and I can point you to the relevant source files if you so desire), just a quick read through the zfs(1M) manpage will mention this example:
# zfs send pool/fs@a | ssh host zfs receive poolB/received/fs@a
As you are no doubt aware, pipes are by definition unidirectional. There is no way the zfs receive can talk to the zfs send at all. Another way to verify this is to check out ZFS backup systems such as Zetaback, which by default store the ZFS send streams as files on a central server (which may or may not actually support ZFS - it's not actually required). Now if an incremental send stream is stored as a file and then at some later point restored, this clearly tells you that there can't be any bidirectional exchange of information going on.
Re:Rsync could have done this too! by brambus · 2015-12-22 11:47 · Score: 1

My pleasure.
Re:Rsync could have done this too! by MachineShedFred · 2015-12-22 11:48 · Score: 1

Depending on what your setup is and what the requirements are, it's fully feasible to have a 'storage server' where all it's RAM is handed over to ZFS for caching and dedup, and you export via NFS to your VM hosting systems on 10GbE. It adds a touch of latency, but if you can host a hundred machines that don't require super low latency and save 90% of the disk space by only having 1 copy of your server OS (for the most part), then you're probably doing better.
It's a viable config depending on what the needs are, and can save far more disk than compressing each image on it's own.

--
Slashdot still doesnâ(TM)t support Unicode after it was added to the HTML standard in 1997.
Re:Rsync could have done this too! by drinkypoo · 2015-12-23 02:43 · Score: 1

Side note for your entertainment in case it interests you
It does

The trick is in the fact that directories are files too (albeit special ones) with a defined hash-table structure. ZFS send simply picks up the changes to the respective directories as if they were regular files and transfers those.
That does seem like functionality which rsync could be enhanced to use. At least, it could be used to more rapidly find duplicates when both ends are using ZFS. rsync ain't going away anytime soon.
I am interested in ZFS but will probably wait until a Linux distribution makes it trivial to implement. I am past the point where messing around with filesystems seems fun.

--
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
Re:Rsync could have done this too! by StayFrosty · 2015-12-23 03:06 · Score: 1

The other advantage is that ZFS replication, unlike RSYNC, doesn't need to calculate diffs because ZFS it already keeps track of what blocks have changed since the last snapshot. This makes the entire process much faster less resource intensive.
Imagine the following scenario:
You are the sysadmin at a 24x7 company. You have a few hundred user's home directories (shared over NFS or SMB) on a fileserver that needs to be upgraded/replaced for some reason. You are tasked with migrating these home directories to a new file server with a minimum outage window.
ZFS replication procedure:
1. Snapshot, send.
2. Repeat 1 until the entire process takes less than a couple of minutes.
3. Shut down the NFS and SMB processes on the old server. Shanpshot and send one last time.
4. Bring up NFS and SMB on the new server. Make appropriate IP and DNS changes.
Total outage for the users: 10 minutes or less.
RSYNC replication procedure:
1. Sync
2. Sync again. Wait forever for diffs to be calculated.
3. Realize it's hopeless, Shut down NFS and SMB, Sync again.
4. Bring up the new server.
Total outage for the users: Hours.

--
"Frequently wrong, never in doubt."

Sheesh, the sheer low quality of TFA by Anonymous Coward · 2015-12-21 22:16 · Score: 1

For those who already understand rsync and zfs the article adds nothing new that is of value. 1/3 of the article is telling you what rsync is, which you can fill with lorem ipsum and still not lowering the next-to-none quality of the article. We already fucking know what rsync is. It's in the man pages for, like 10+ years. And why do you need a Jedi picture just for that?

Then the useless benchmark, taking another 1/3. No repeatable experiments. No statistics. Only one-shot timings. And the worst things is that the result is completely expected. The first pseudo-benchmark is network-limited so the results would be the same. The second is completely expected. The incremental payload is so small that any difference is just overhead, and you expect the rsync with overhead NOT to perform worse than native FS? The third is pointless. It's apples and oranges. Anyone who knows the difference between syncing and replicating already know about this. We don't need a whole article telling us that.

And finally, the obligatory shitty script dump (nobody cares about the verbatim copy of your perl script, kiddie), the apple-to-orange pros-cons comparison, clueless stock picture taking half screen space, and the non-conclusion.

The whole thing can be condensed into a 500-word tech briefing and you wrote that?

Sheesh.

Re:The filesystem so fast... by Anonymous Coward · 2015-12-21 22:45 · Score: 1

That was ReiserFS, not ZFS.

ZFS vs BTRFS by Maow · 2015-12-21 22:47 · Score: 2

Jim Salter writes some great pieces on file systems for Ars Technica.

At the linked article are Related Links. Of particular note is "Atomic Cows and Bit Rot" -- read that if you're interested in modern file systems.

Re:ZFS vs BTRFS by phayes · 2015-12-22 02:07 · Score: 2

Whereas /. is filled with people such as yourself...
I've been on /. & ars for close to 2 decades & the level of idiot posts is unfortunately much higher here.

--
Democracy is a sheep and two wolves deciding what to have for lunch. Freedom is a well armed sheep contesting the issue

Re:The filesystem so fast... by Anonymous Coward · 2015-12-21 23:35 · Score: 1

Only after the Russian mail-order bride steals the money from your open source "wealth" to fund her new boyfriend's BDSM hobbies.She actually sounded a lot like my ex, the one with the website on breast feeding with nipple rings.

And no, I'm not making *any* of this up.

Re:BTRFS is the future by rl117 · 2015-12-22 00:03 · Score: 4, Interesting

Er, no. Btrfs may one day make feature parity with ZFS, and it may also achive the reliability of ZFS, but it has a long, long, way to go in both areas to get to those points.

The on-disc structures might have been declared "stable", but what does that mean, really? That you'll be able to mount current filesystems on future kernels, yes. That the frozen design was correct and contains no design flaws? No. Personally, I think they froze it way too early. There are a number of fairly fundamental issues with the Btrfs design which compromise its performance (fsync) and integrity (unbalancing, data loss on recovery), and in some cases place arbitrary limits upon things (e.g. the hardlink issue). Some can be mitigated, while others can not. These and other issues are easily found and researched.

Seriously, I've been using Btrfs since very near the beginning for a variety of tasks. But I've been objective about it, rather than a blinkered fanboi. It's an interesting filesystem with some good ideas. But it has /always/ been a case of "next year it will be stable", and the performance is dire. Progress has been painfully slow, and the bugs I've encountered along the way have been numerous and show-stopping. Maybe it will "get there", but I think your assertion that "once BTFS userland side gets stable" that it will replace ZFS is incredibly naive. It assumes that there are no major issues remaining on the kernel side, and it also assumes that the only thing needing doing on the user side is stability. Based on its history to date, the likelihood of the kernel side being bug-free is close to zero. On the user side the tools are primitive, feature-incomplete and almost completely undocumented, containing little information and no examples. On the ZFS side, the tools are feature complete and are properly documented, with examples, and with whole sets of training material on top of that.

If you needed to make a decision on which to use for a serious deployment, or even just for a smaller scale home NAS, right now if you objectively compare the two, the choice is quite clear, and it's not Btrfs. Based upon the development history of the two, it's unlikely that this will change much in the next few years. Remember also that ZFS development is very active, perhaps even moreso than Btrfs. But who knows, maybe by 2020 Btrfs will surpass it.

Re:BTRFS is the future by Gaygirlie · 2015-12-22 00:33 · Score: 1

I am using Btrfs on my NAS/firewall/server quite happily and in my experience it's been stable and performant, but overall I agree with you. The tools could be better and there are a lot of idiosyncracies here and there. Personally, I find the fact that Btrfs is terribly fragmentation-prone somewhat of an issue as running defrag on any snapshotted or deduped content will ruin the reflinks and ends up duplicating all the blocks needlessly, thereby eliminating the whole point of using snapshots in the first place.

Re:BTRFS is the future by Bengie · 2015-12-22 01:23 · Score: 2

+9001 Funny! I needed that. BTRFS, the FS designed by devs for cool new features!. ZFS, the FS designed by sysadmins for sysadmins.

"The cloud" by ZorinLynx · 2015-12-22 01:28 · Score: 1, Informative

Anyone else getting tired of is term? All it means is "someone else's computer". All you're doing is renting server space and replicating your data there. There's nothing special about it.

Re:"The cloud" by Anonymous Coward · 2015-12-22 01:58 · Score: 1

Yep. 'The Cloud' is just shifting responsibility to someone else, who may or may not be doing a proper job of security or backups. This seems germane.
Re:"The cloud" by fluffernutter · 2015-12-22 02:20 · Score: 1

"Ignorance is bliss"

--
Laws are rules for the court, but merely a bottom bar to hit for life. Think beyond laws in your actions always.
Re:"The cloud" by The-Ixian · 2015-12-22 03:18 · Score: 1

Hmmm.... good point... perhaps we need "Smart cloud 2.0"

--
My eyes reflect the stars and a smile lights up my face.
Re:"The cloud" by Dragonslicer · 2015-12-22 05:38 · Score: 1

Anyone else getting tired of is term? All it means is "someone else's computer".
To be fair, that's kind of what it has meant for years. I have a networking textbook that's 15 years old that represents unspecified parts of a network in a network diagram as a cloud shape. So "piece of computer network that I don't care much about the details", e.g. the Internet, has been called "cloud" for a while.

Of course, this is not to be confused with "cloud computing", which has a more precise definition (basically distributed processing, but with on-demand virtual machines instead of physical nodes).
Re:"The cloud" by Anonymice · 2015-12-22 05:52 · Score: 1

Never heard of a private cloud then? We run a large virt cluster here & "the cloud" is the most straightforward & friendly way for me to refer to it to the higher ups. "Cloud" is just the same as "cluster", however the former is more widely recognised.

Re:BTRFS is the future by Bengie · 2015-12-22 01:28 · Score: 1

ZFS too resource heavy? Yeah, don't run it on a cell phone. BTRFS balancing seems to have some major issues. I'm not sure if it's fundamental or not, but they're old issues and haven't been fixed for many years.

Re:Altogether now: "replicaton is not backup" by Bengie · 2015-12-22 01:33 · Score: 1

ZFS is nice .. but it's just not been stable

By your definition of stable, nothing is stable. ZFS is not perfect, but it is closer to perfect than anything else.

Re:Altogether now: "replicaton is not backup" by BitZtream · 2015-12-22 02:03 · Score: 2

Without some kind of incremental snapshot, with read-only privileges after the snapshot, straight replication is next to useless if someone does "rm -rf /". And it happens *all the time*.

So ... zfs covers that ... since it does exactly what you suggest.

Sure, if you can afford to buy 3 times as much disk

What? If you want mirroring or RAID like qualities, yes, you need to duplicate data, thats true of any mechanism like this... you do realize thats what things like NetApp do too ... right, just mirroring or raid?

and roughly 10 times as much network bandwidth as you ever really process with,

... this makes no sense? How does the network come into play here? You're just making random shit up?

ZFS is nice if you can afford one sys-admin/Terabyte of data to try to keep it up to date, but it's just not been stable.

The company I work at rolls over roughly 50tb of data PER DAY, several petabytes worth ... in ZFS ...

You'll have to pardon me if I doubt some random Anonymous Coward spewing clear ignorance has any idea what 'stable' is after making such stupid statements.

--
Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager

It's a subset not usable outside the set by dbIII · 2015-12-22 02:07 · Score: 1

Actually no - it's not better unless you are already using ZFS in which case you probably already know about the feature.

Re:It's a subset not usable outside the set by MachineShedFred · 2015-12-22 06:39 · Score: 1

Snapshotting has been in ZFS from (practically?) the beginning.
This article is about a cloud provider specifically providing a workable service to act as a ZFS snapshot receiver, which before required you to do some serious customization on a general-purpose compute environment like Amazon EC2.
At the prices that rsync.net charges for what it is, this is a pretty compelling off-site solution for my media storage, as it's already on a ZFS pool via FreeNAS.

--
Slashdot still doesnâ(TM)t support Unicode after it was added to the HTML standard in 1997.

Re:Altogether now: "replicaton is not backup" by MightyYar · 2015-12-22 02:19 · Score: 1

Fortunately, zfs also supports snapshots, and those can be sent/received as well.

--
W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.

Re:BTRFS is the future by twokay · 2015-12-22 02:35 · Score: 1

FreeNAS is FreeBSD 9 or 10 with a config layer over the top for a web interface and idiot proof cli (FreeNAS 10). Nothing is changed from the FreeBSD version of ZFS apart from some sysctl variables.

ZFS is an ENTERPRISE file system, it will eat all the RAM you give it and get faster with more RAM as it can cache more I/O. It is designed run on a well spec'ed server with a UPS.

Of course you can run it on anything FreeBSD supports and try your luck, it works well even then for most people.

--
Wannabe nerd.

Re:BTRFS is the future by rl117 · 2015-12-22 02:50 · Score: 4, Insightful

Are you for real AC, or just trolling?

Your Synology "reference" is a classic "appeal to authority", only it's a really bad choice of authority due to its complete lack of any technical detail or substance of any kind. That link is to a marketing page for a company which makes money selling hardware. It's just a few bullet points (snapshotting, checksumming in essence), without any discussion of the actual tradeoffs or comparison with other systems. It's worthless. It's only purpose is to tick a feature box to act as an incentive to purchase their systems; as for the actual performance and reliability of those features--that's the customer's problem. Caveat emptor.

I've done more than casual work and development with Btrfs. For example, from back when I was a Debian developer, here's the original inital support for Btrfs snapshotting in schroot. This lets you create virtual environments from Btrfs snapshots, as well as other types such as LVM and overlays. You can then plug this into other tools such as sbuild, and then build the whole of Debian using snapshotted clean build environments. Doing this, Btrfs fails hard around every 18 hours, going read-only. Why? Creating and deleting 18000 snapshots for 8 parallel builds quickly unbalances the filesystem, requiring a manual rebalance. You don't see that unfortunate detail in the Synology fluff page, do you?

You can also get snapshots and decent recovery (albeit without block-level checksums) from LVM and mdraid. In my experience, its recovery behaviour after real hardware failure is vastly more reliable than Btrfs. Simply put, it has always resynched the data without problem, while Btrfs has caused irrecoverable data loss, despite it theoretically being much better. LVM snapshots have very different tradeoffs as well. And on modern Linux with udev, we had to abandon using them due to races in udev/systemd making them randomly fail.

The point I'm making is that the reality of the chosen tradeoffs between performance, reliability and featureset of the different filesystems is a subtle one. You can't reduce it down to "Btrfs is better" or "ZFS is better". That's marketing. But I have spent over seven years pushing Btrfs to its limits, and have found it sorely lacking. It's unacceptable that it unbalances itself to the point of unusability. It's unacceptable that it has led to irrecoverable dataloss on several occasions. It's also unacceptable that in its eight years of existence, none of the developers could be bothered to write any decent documentation. The dataloss was down to bugs, some of which are fixed, but it does leave you in a position of lacking trust in it in the face of such problems. If you compare this with ZFS, while it's not fair to say it has been totally bug free, it has been almost bug free, and the number of dataloss incidents is small. I've yet to encounter any problems with ZFS myself, but I've encountered many serious issues with Btrfs.

Anyone who uses Btrfs or ZFS on a NAS system does so at their own risk after researching the various options and their tradeoffs. Just because a vendor decides to make and market a system using Btrfs does not make that system the best choice. It just means they thought they could make some profit from it.

Re: BTRFS is the future by bill_mcgonigle · 2015-12-22 02:50 · Score: 1

ZFS disk structures were stable a decade ago but frankly the userland is still a bit buggy today, and that's with ten times as many people working on it as btrfs and people knowing full well where the problems are and what needs to be done to fix them. btrfs hasn't gone through that discovery process yet.

Don't assume undone work is easy. I'll be delighted to be proven wrong in five years (I said the same thing five years ago).

--
My God, it's Full of Source!
OUTSIDE_IP=$(dig +short my.ip @outsideip.net)

Re:Oracle by rl117 · 2015-12-22 03:01 · Score: 1

Er, OpenZFS...

ZFS originated within Sun, which was bought by Oracle. Oracle then laid off most (all?) of the ZFS developers, who then went to work for other companies. The current ZFS development is no longer inside Oracle, and nor is it owned by them. They own the copyright on the original CDDL releases. Big deal. Not using it because of the historic association with Oracle would be a little... extreme.

Re:Altogether now: "replicaton is not backup" by rl117 · 2015-12-22 03:53 · Score: 1

Without some kind of incremental snapshot, with read-only privileges after the snapshot, straight replication is next to useless if someone does "rm -rf /". And it happens *all the time*.

So, exactly what ZFS provides then... You take periodic snapshots (hourly, daily, weekly, or whatever), then send the deltas between the snapshots to the destination system. You can easily put that in a cron job and have a regular push to a backup system (hey, exactly like what the tool in TFA is doing...). If someone does wipe out all their files, you have the snapshot(s) containing it on both the source and destination system, depending upon your schedule for dropping old snapshots. However you decide to manage things, you can recover the removed files so long as they are present in an older snapshot.

Re:Oracle by rl117 · 2015-12-22 03:57 · Score: 1

You do realise that Btrfs originated within Oracle, right? ZFS was merely acquired by them.

I wonder why Docker doesn't deploy to OpenIndiana by emil · 2015-12-22 04:36 · Score: 1

If btrfs has so many issues, I wonder why Docker doesn't have a deployment on Illumos. or SmartOS.

I would think that Docker enthusiasm would be damped by a beta filesystem and (the lack of) verifiable security in package content.

ZFS + Linus is not a GPL violation by Aaden42 · 2015-12-22 04:46 · Score: 1

Don't let the licensing FUD scare you. Linus has publicly stated that licensing in a case that's a very near equivalent to ZFS' licensing is fine.

The anticipated problem with the license has always been on the Linux side. The license ZFS is released under doesn't in any way prohibit the ZFS code from being used in other places with other licenses (like the *BSD's). There has never been a concern that using ZFS with Linux violates the ZFS license (and thus could bring Oracle's well-fed lawyers down upon you). The contention has been that combing CDDL code with GPL-2 in a derivative work violates the GPL and thus places you in trouble with Linux's license. The core problem is that CDDL places additional restrictions on binary code resulting from derivative works, which GPL-2 prohibits.

Linus has weighed in specifically on the AFS filesystem module here: http://yarchive.net/comp/linux...

But one gray area in particular is something like a driver that was originally written for another operating system (ie clearly not a derived work of Linux in origin). At exactly what point does it become a derived work of the kernel (and thus fall under the GPL)?
[...]
Historically, there's been things like the original Andrew filesystem module: a standard filesystem that really wasn't written for Linux in the first place, and just implements a UNIX filesystem. Is that derived just because it got ported to Linux that had a reasonably similar VFS interface to what other UNIXes did? Personally, I didn't feel that I could make that judgment call. Maybe it was, maybe it wasn't, but it clearly is a gray area.
Personally, I think that case wasn't a derived work, and I was willing to tell the AFS guys so. [Emphasis added]
- Linus Torvalds on the fa.linux.kernel group, Thu, 4 Dec 2003

Given that ZFS was originally written for Solaris and the core code works essentially unmodified (with a porting layer in some cases) on Solaris, *BSD, Linux, possibly other systems, there are lots of indications that it should fall into the same category as the AFS code: The ZFS modules are not derivative works of Linux and thus may be used with Linux even though their license prevents them from being incorporated into Linux.

Re:rsync.net by Immerman · 2015-12-22 06:05 · Score: 1

Having trouble distinguishing between rsync, the tool, and rsync.net, the online service? Having never used either, the distinction was still perfectly clear to me.

--
--- Most topics have many sides worth arguing, allow me to take one opposite you.

Discount for slashdot folks by kozubik · 2015-12-22 06:53 · Score: 1

We've had a very significant discount for HN readers for years and we'd be happy to extend that to /. readers. Just email and ask.

Really happy to be here - I am not sure why I am labeled as "new submitter" since I have been a slashdot user for ... 15 years ?

Happy to answer any questions about our service here as well.

Opens up another major possibility by Solandri · 2015-12-22 07:55 · Score: 1

If I'm reading this right, ZFS sync opens up one other huge, huge possibility. I had this idea nearly 15 years ago (shortly after Napster), but didn't have the technical expertise to implement it: A distributed redundant filesystem.

ZFS doesn't think in terms of files. It thinks in terms of blocks, and in a redundant z-volume (similar to a RAID array) it distributes those blocks over multiple virtual devices (vdevs) - you can think of them as disks, but they don't have to be. These vdevs can be a disk, a partition, a file on a disk, or more crucially a SAN or iSCSI - disks which aren't connected directly to the computer but are accessed over a network. Til now, those last two have been disks on the same premise, just not in the saem computer. ZFS sync could open it up to any networked vdev anywhere in the world.

So what's the big deal? The big deal is that in a redundant filesystem, you cannot reconstruct the original data from any single vdev. If you have 4 drives in RAID 5, no single drive has a complete file. You need all of the data off of at least 3 drives to reconstruct a file. The same goes for ZFS - if you're using 2-drive redundancy and you have 6 vdevs, you need the data off of at least 4 vdevs to reconstruct the file.

Now what if each of those vdevs were located in different places around the world? One could be Google Drive, another Dropbox, another Microsoft OneDrive, etc. Your data could be on the cloud, and it would still be accessible even if one service went down or even shut down completely. ZFS would just treat it like a drive failure. It would re-verify and recover after the service came back online. Or you could simply replace it with a vdev on a different cloud service. (ZFS redundancy is on a block level, so a block failure doesn't mean it drops the entire vdev from the array like RAID does with a disk which generates an error. It simply marks the block as bad and tries to reconstruct it from redundant info on other vdevs. Other blocks stored on that vdev are assumed to still be good, until you access it and the checksum says it's bad.)

Also, no single cloud service provider would have a complete copy of your data. Hackers could manage to break into a service and get all your data stored at that service. But unless they managed to get data from (n-r) services (n = number of cloud vdevs you're using, r = redundancy level), they couldn't reconstruct your data. More to the point, if said service notified you of the breach in a timely manner, you could respond by creating new vdevs with different encryption, copying your data from the old vdevs to the new, then erasing the old vdevs. Unless the hackers managed to simultaneously hack (n-r) cloud services, your data cannot be compromised. (Or if you're on the dark side, Hollywood could get the feds to raid a cloud storage service and get all your data there, but unless they did it simultaneously with (n-r) services, they wouldn't be able to see that you have copies of pirated movies stored on those services.)

I've been trying to set up something similar between my sister's, my parents', and my house, with our NASes backing up each other so we won't lose our data if one house burns down. But it's been a PITA with rsync. Because rsync thinks in terms of files, each house has to have a complete copy of the other houses' data. If I were able to do it with ZFS vdevs, it would represent a 50% space savings. More if I had more homes to work with.

Re:Opens up another major possibility by Muad'Dave · 2015-12-22 09:17 · Score: 1

Look into how HDFS works it's the filesystem underlying Hadoop.

--
Tiller's Rule: Never use a word in written form that you've only heard and never read. You will end up looking foolish.

Re:BTRFS is the future by sjames · 2015-12-22 08:14 · Score: 1

I guess you missed the RESOLVED tag on that.

Re:BTRFS is the future by rl117 · 2015-12-22 08:41 · Score: 1

To be fair, the race existed in udev prior to the systemd merge as well. When lvremove randomly stops working, it's a bit surprising, and it took a while to pinpoint udev as the culprit keeping the snapshot devices open and preventing their removal. "Helpful" such behaviour is not. We had to move all the debian buildds from using lvm snapshots to unpacking tar files as a result (btrfs being too fragile as mentioned).

never RAIDZ yourself, but run run run to get some by epine · 2015-12-22 11:25 · Score: 1

Yeah, he writes okay pieces, but it kind of annoys me when he throws up blanket advice and then practically trips over himself extolling the opposite.

ZFS: You should use mirror vdevs, not RAIDZ

Guess what? The entire rsync.net service is built on top of RAID-Z3, if I read their promotional portal correctly.

One use case I can see for this is using ZFS to back up Postgres databases. I'm not the only person to think this might be a good idea. A while back, I listened to this talk, which I really enjoyed:

Keith Paskett: PostgreSQL on ZFS

On hard experience, he's particularly wary about the "drop table" oops disaster scenario.

Keith Paskett bio

* infrared radiometric calibration chambers Space Dynamics Laboratory
* helped develop Utah State University's Climate data server
* National Climate Data Center validated climate data
* all stored in PostgreSQL of course

Re:Altogether now: "replicaton is not backup" by MachineShedFred · 2015-12-22 11:36 · Score: 1

1. ZFS Snapshotting is incremental, just like NetApp. In fact, it's so 'just like NetApp' that NetApp sued Sun Microsystems over it.
2. You don't know what the hell you're talking about. See #1.

--
Slashdot still doesnâ(TM)t support Unicode after it was added to the HTML standard in 1997.

Re:Oracle by MachineShedFred · 2015-12-22 11:39 · Score: 1

Oh, so in your hatred of Oracle, you're recommending a filesystem project that was started by... Oracle.

Only reason Oracle isn't still the major contributor to btrfs is because they bought Sun and got a complete version of what they were trying to create with btrfs.

--
Slashdot still doesnâ(TM)t support Unicode after it was added to the HTML standard in 1997.

Re:LXC and security by Rutulian · 2015-12-22 15:42 · Score: 1

You're able to run as-root / Set-UID binaries with-in them? Nope. LXC emulates this by mapping UID-0 in the container to UID-x on the host via namespaces.

No, that is not correct. Root is root in an lxc container subject to some limitations (ex: making device entries), just like it is with BSD Jails. The mapping that you are referring to is a security mitigation feature, should an attacker manage to break out of the container. If a root-user within the container breaks out of the chroot (containers are essentially chroot with cgroups added in), but are still within the container process (iow, no buffer overflow or similar vulnerability), they will be subject to unprivileged status on the host (basically, the same as an unprivileged shell user). That is good, and is not something that BSD Jails do afaik. So, one might say lxc is more secure than jails in this respect.

From one of the maintainers of Docker (as of June 2014):

You do know that Docker and LXC are not the same thing, right? Docker is built on LXC, but they are not synonymous. Also, the quote is more of a "be careful with this" rather than a "containers can't handle this" type of comment. The thing about Docker specifically that makes it different from LXC, is the docker user space process, which is larger and possibly subject to more attack vectors, hence the conservatism about security. Just plain old LXC containers, though, should be as secure as anything else on the system (sans kernel vulnerabilities, etc).

there has been only one advisory about escaping out of a jail (and it was because of a devfsd bug, not jails itself)

Right, so aside from a kernel vulnerability (devfsd) a kernel-provided capability (jails) is perfectly secure? Nice bit of sophistry there. Jails is fundamentally no more or less secure than lxc containers. They are both "operating system-level virtualization" techniques implemented in similar ways (using chroot combined with kernel capabilities to separate userspace processes and resource limits). They are effectively the same.

Re:BTRFS is the future by sjames · 2015-12-23 09:51 · Score: 1

BTRFS is less mature than ZFS, but it has a lot of useful functionality and is in some ways more elegant. For example, the snapshot of a subvolume is a first class filesystem in itself without dependency on it's parent. It's also a lot better about handling replacement of physical volumes underneath it if you have mirroring turned on. In particular, you can arbitrarily increase the size of the filesystem by using a larger replacement or just adding on more drives.

On the other hand, I'm not touching the raid5/6 with a ten foot pole in it's current state.

Slashdot Mirror

ZFS Replication To the Cloud Is Finally Here and It's Fast (arstechnica.com)

86 of 150 comments (clear)