Domain: zfsonlinux.org
Stories and comments across the archive that link to zfsonlinux.org.
Comments · 56
-
Re:About time... HFS+ is crap
Since you are already running a NAS, you may be interested in ZFS on Linux or FreeNAS for a BSD system as a way to bring ZFS into your mix.
-
Re: Stone tablet and chisel
For those who do: http://zfsonlinux.org/
-
Re:bit rot
i have a machine running debian squeeze with a raidz4+1 ssd cache pool that has hosted several VMs with heavy load for 3 years now. i have seen 1 error that was found and fixed during a routine scrub.
http://zfsonlinux.org/ has more. -
Some questions and options
First you need to ask yourself some questions:
1. what are you trying to protect against? Hard Drive Failure?, Multiple hard drive failures? Fire? Theft? Disk/file corruption? Destruction of your whole home/work? Everything?
2. what's your budget?
3. how many copies of data do you want and where?
4. If you're looking at a cloud backup service then what's your bandwidth? How much of your internet usage are you happy to allocate to backups? how much is your data change rate? (i.e no sense using a cloud backup provider if you change your data faster than you can upload it)Some options:
1. Cloud Backup service (e.g. Backblaze, many here)
2. Cloud Storage provider (e.g. Dropbox, Amazon Glacier)
3. Your own solution (e.g. FreeNAS, external usb drive, eSATA (external SATA) drive, home server, unison, xcopy etc...).If you do use your own server solution then I'd recommending having a look at ZFS filesystem (e.g. zfsonlinux)
-
Not Invented Here Syndrome?
I was hopiing Apple would license ZFS
ZFS is under CDDL and would not even need to be "licensed" in the usual sense — it is free for anybody to take. "Too free" for certain zealots, in fact, which is why it was not part of Linux kernel for a while — until the supposed "license incompatibility" myths got debunked.
Even Linux now offers ZFS — Apple would've had a much easier time porting it, because MacOS is already FreeBSD-based and the FreeBSD-project had ZFS available "out of the box" for several major releases spanning many years.
What did Apple find lacking about ZFS, that would justify creating their own, is, indeed, a mystery. Probably, a case of the Not Invented Here Syndrome. Sad...
-
Re:For home users, basically meaningless.
All file systems are approximately the same for most day to day users.
ZFS is not merely yet another way to arrange pieces of your files on the disks (and "disks") — it is a filesystem and a volume-manager in one.
I would be interested in knowing which is fastest at read/writes.
ZFS adds features, which are a rarity among other filesystems: checksumming, options for redundancy and deduplication, snapshots, etc.
We spent decades keeping the underlying storage separate from the filesystem on top of it — neither ufs, nor xfs, nor ext know, what actual hardware is underneath the
/dev/foo — and SCSI or (S)ATA protocols is all they can use to talk to this device. In these days of RAIDs and SSDs, the newfs(8) still has notions of sector-sizes and cylinder-groups, for crying out loud.With ZFS we have a filesystem, that is aware of the underlying hardware and can make a good use of that knowledge. It is, what Unix filesystem would've been, had we had RAIDs in the seventies... But the above-mentioned checksumming and snapshots as well as redundancy and deduplication options are useful even on with a single drive.
For home users, basically meaningless.
Come, come, even FreeNAS users use ZFS on their systems to protect their content from "bit rot" and hardware failures. Smarter folks have been turning to FreeBSD, which has been offering ZFS for years — and Linux developers started working on porting it to Linux long ago — first as a FUSE-module, and now, finally, as part of the kernel.
-
Re:BTRFS is getting there
--You can definitely add more disks if you are using mirrored drives in your pool, instead of RAIDZ. I created a Linux ZFS RAID0 (no redundancy) pool with 2 brand-new drives initially, then bought 2 more drives of the same brand and capacity a month later, and upgraded the pool in-place with no downtime to a zRAID10.
--If I want to expand the size of the pool, I can just add 2 more disks in a mirrored configuration.
# zpool add mirpool mirror ata-ST9500420AS_5VJDN5KL ata-ST9500420AS_5VJDN5KJ
--Note that this syntax is using Linux
/dev/disk/by-id devices.--There are some caveats and best-practices that one should read up on, for instance using ashift=12 with 4K sector drives; and using GPT partition tables on ZFS disks; but ZFS has by far been the most reliable and useful filesystem I've ever used.
REF:
https://blogs.oracle.com/partn...
http://zfsonlinux.org/faq.html
http://jrs-s.net/2015/02/06/zf...
https://jsosic.wordpress.com/2... -
Re:Why? What advantages does this have over ZFS?
Disclaimer: I ZFS.
We had a problem that ext* just couldn't handle. We have a medium sized filesystem with about 250 million data files that we needed to back up. Every day. Rsync completely failed at the job, taking between 1 and 2 days to do the job.
Desperate to find a solution, we tried ZFS and snapshot replication. Our time to replicate to DR, dropped from days to a few hours, backup storage requirements dropped through the floor, and server load dropped at the same time! This is on a reasonably priced set of systems, Xeon-based intel systems with just 32 GB of RAM and 6x 4 TB drives.
ZFS is pretty decent, and has proven to be more reliable for our use than ext*. However, its licensing presents a developmental pit fall. On Linux, it won't ever be a "first rate citizen" even though the ZoL project has done a great job making it very available. ZFS also has a number of pretty terrible problems:
1) You can't remove a vdev from a ZFS pool without destroying the pool.
2) You can't upgrade a vdev's redundancy level once you've added it to a pool.
This means that, if you're careful, ZFS is wonderful. But it's easy to make a mistake that you can't easily back out of. See the section hating your data to see what I mean.
BTRFS has been "only a few years away now" for quite a few years now. I'm not convinced it will ever reach production ready status. Apparently it has some architectural problems that have been criticized pretty soundly. I'm no longer convinced about the future inevitability of BTRFS.
I sincerely hope that BCacheFS really delivers on these promises, I'd love it!
-
Re:FreeNAS
-
Re:Why ext4
Name one that actually boots the Linux kernel, and doesn't just run in user space. (Yes, I am a fan of ZFS, but not the Linux implementation.)
You really should get out more. ZFS on Linux is not to be confused with the ZFS Fuse project. You can boot from a ZoL filesystem. In general ZoL is about as stable, complete, and reliable as any ZFS.
-
Re:Btrfs?
You can use the real deal http://zfsonlinux.org/ too. Works great, especially if you have IEEE-1394 or other odd-bus devices, support for which are, uhm, unreliable or needing a maintainer in FreeBSD 11 current.
-
Re:btrfs?
It would really be nice to have a stable next-gen file system that can scale. ZFS is for the most part FreeBSD only and I'm just not reayd to switch to FreeBSD.
There's a Linux version of ZFS. As far as I know, it works quite well, though I can't make any guarantees.
-
Re:Not for new users of FreeBSD
Are you switching to BSD just for ZFS?
Learning BSD is probably a good investment, but ZFS on Linux is production/stable and is excellent. I've been using on CENTOS 6 for over a year and it has been even more stable than EXT4 in a production environment.
-
Re: What To Expect With Windows 9
In the "Linux" world you can indeed run zfs, but you have to roll your own since it uses an incompatible license
Luckily, if you're using a common distro it's already been rolled for you.
-
Re:What is BSD good for?
Maybe I am just missing your
/humor tag, but I thought the ZFS on linux thing had been taken care of years ago. -
Re:What is BSD good for?
Thou shall ask and The Internet shall provide: zfsonlinux.org and wiki.ubuntu.com/ZFS.
Note however the license problem en.wikipedia.org/wiki/ZFS#Linux (the article even talk about potential patent problems in case a re-implementation is attempted
...) -
Re:ZFS, Apple!
As I understand it, ZFS is BSD licensed.
Nope. It's under the CDDL, which isn't GPL-compatible and prevents ZFS from being distributed as part of the Linux kernel. If it could, it probably would've been adopted by the masses years ago.
-
Re:ZFS, Apple!
They would also be sued pretty quickly by Oracle. Clearly not an option.
Your conclusion is a bit hasty and unwarranted. I am not going to tell you that Oracle CANNOT sue anyone for any trumped-up reason, but ZFS is licensed under the Common Development and Distribution License (CDDL) and is open source. For linux, there is an issue with how CDDL plays with GPL, so no distro has yet bundled ZFS with linux. Linux users, however, can themselves pick up "ZFS on Linux" and install it themselves without violating either the CDDL or GPL.
But OSX is not GPL. Other systems that are not GPL bundle ZFS, and are not sued. For example, FreeBSD comes with ZFS, and there are a number of other systems, such as FreeNAS, PS-BSD, illumos and nexenta.
See OpenZFS.
-
Re:Trial by fire...
ZoL is very active and very up-to-date. All the versions and compatibility is in sync with Illumos (the main source of OpenZFS) and FreeBSD. You can create and move zpools between these 3 platforms seamlessly.
It's close to that, but not exactly that. There are new feature flags in illumos-gate that are live in the field in "distros" like smartos and omnios (and in freebsd head) and which are not even on the plan for zfsonlinux 0.6.3.
The most important is:
feature@spacemap_histogram
which provides an enormous performance win on pools which are pretty full.
zfsonlinux also has some divergences peculiar to it (e.g. dataset property xattr=sa) that have not been pulled into illumos-gate yet, although several of them are planned, with a bit of redesign.
Additionally, zfsonlinux has numerous issues that can lead to kernel panics because of Linux kernel memory management and the kernel ABI (in particular Linux hates the sort of large stack allocations that zfs grew up with) -- the way to avoid these in 0.6.2? Keep your ARC tiny, and eat the performance hit! There are also numerous locking issues.
:/Finally, there was this tragedy http://zfsonlinux.org/msg/ZFS-...
That and an earlier xattr bug shows that unfortunately even at the same pool and dataset versions and with the same feature flags, a pool from one
Additionally, mounting a pool does not mean you can necessarily use the datasets reliably (zvols with arbitrary blocksizes are a current headache; case dependence and normalization have been the source of others; ZFS/NFSv4 ACLs vs older or different ACLs on some systems have caused problems discovered when using datasets that appear to be fine). Finally, the share properties are a source of cross-platform annoyance as well. Some of these dataset-compatibility problems persist across zfs send |
... | zfs recv as well, or are only noticed when the received dataset is put to use. :/The bright side is that the openzfs project looks like it will reduce incompatibilities and increase new feature adoption going forward, which is to everyone's benefit.
-
Re:Trial by fire...
> Personally I would be *EXTREMELY* wary of running ZFS on Linux.
So basically you are making an decision based on emotion instead of actual facts??
Try reading the FAQ next time:
-
And facebook will be burnt
Not that anybody'll really notice, but I have a feeling that Facebook's backup and recovery system is queuing up for a stress test.
Having lost data with BTRFS multiple times on my disk array (as recently as last month), I have no confidence in it. The best thing I can say about btrfs is is that it was able to tell me that it had lost data. Not many filesystems do that; but ZFS on Linux has been rock solid for years, and not only tells me if data has been lost, but actually preserves the data as well.
-
Re:Trial by fire...
Realistically, it would be nice to see the native (not FUSE based) code from OpenZFS be included as an alternative, but the CDDL/GPL conflicts likely will make this a no-go.
Well, isn't this your lucky day, then? ZFS on Linux works now, today, without the use of FUSE. Nothing about the license conflicts prohibits use or distribution, just distribution together. I have ZFS/Linux servers in production right now, and they are quite stable. Starting with a vanilla install of CentOS, the instructions are roughly:
1) Install the yum repo file.
2) yum Install kernel-devel zfs
3) Start the ZFS service.
4) Start creating ZFS volumes....A reboot isn't typically necessary... (though not a bad idea)
-
Re:An inspiring decision
Red Hat can't get the stability and/or performance out of it - they are going with XFS.
Quite possibly Red Hat intends to GA RHEL7 before November and it doesn't give them as much time for testing as OpenSuse has. Other than the simple fact that RHEL and Suse are enterprise, and OpenSuse is not.
I sure as hell won't be using Btrfs for any data I care about in the foreseeable future (not through 2015 at least). I consider Zfs On Linux to be much more reliable and well-tested as well as far superior.
Putting BTRFS as a default file system is just idiotic (even moreso coming from a distribution such as Suse). Right now I'd say the best file systems on linux are XFS followed by EXT 4 in terms of reliability and performance. BTRFS is new and shiny, but it's a 2 legged horse. Not something that speaks well for reliability. I'd have expected such a move from Fedora not from Suse.
-
Re:An inspiring decision
Red Hat can't get the stability and/or performance out of it - they are going with XFS.
Quite possibly Red Hat intends to GA RHEL7 before November and it doesn't give them as much time for testing as OpenSuse has. Other than the simple fact that RHEL and Suse are enterprise, and OpenSuse is not.
I sure as hell won't be using Btrfs for any data I care about in the foreseeable future (not through 2015 at least). I consider Zfs On Linux to be much more reliable and well-tested as well as far superior.
-
Re:RAID != Operating System
--It will prolly never be in the main kernel tree due to licensing issues, but:
-
ZFS on Linux
You can create a pool with 1 disk, or mirrors, or RAIDz1, RAIDz2, or RAIDz3. Optionally you can add a SLOG (ZIL on a SSD partition/disk) (make sure to create a mirror of two SLOG partitions/disks) and/or extra performance by adding L2ARC cache on an SSD. Creating backups of a pool/dataset is easy by using zfs send, ssh, and zfs receive.
-
Re:Of course! And you never need more than 640K RA
> I currently have a 2 TB WD Black system drive - what do I replace it with?
You don't. You _augment_ it with an SSD.
OS + Critical (most often used) apps on the SSD. Everything else on the spindles.
The elephant in the room is that SSDs are unreliable so of course everything is backup on a NAS (Network Attached Storage) which you should be doing anyways, right?! I suggest FreeNAS http://www.freenas.org/ which is based on BSD and supports ZFS. Even has a GUI if you don't want to mess around with the command line. Or if you use Linux you can use ZFSonLinux http://zfsonlinux.org/
If you just want to a buy an off the shelf solution that just works Drobo is OK.
http://www.amazon.com/Drobo-Storage-Gigabit-Ethernet-DRDS4A21/For SSD can personally recommend
* Samsung 840 PRO Series http://www.newegg.com/Product/Product.aspx?Item=N82E16820147193
* Intel 320 or 520 Series http://www.newegg.com/Product/ProductList.aspx?Submit=ENE&Description=intel+ssdCheapest SSD prices are < $0.75 / GB. Just wait for them to go on sale (Black Friday, etc.)
-
Re:ZFS - incremental/snapshot?
Very nice suggestion about using two pools !
>of course you will need a system that can use ZFS
Actually I was suprised how well "ZFS on Linux" works if you don't have a FreeNas/BSD system.
* http://zfsonlinux.org/It is too bad the ZFSonLinux documentation is total garbage but at least it was relatively painless to get it to work on a spare Ubuntu box. IIRC, ZFS on Linux setup was
...sudo apt-get update
sudo apt-get install uuid-dev
wget http://archive.zfsonlinux.org/downloads/zfsonlinux/spl/spl-0.6.1.tar.gz
wget http://archive.zfsonlinux.org/downloads/zfsonlinux/zfs/zfs-0.6.1.tar.gz
cd spl
./configure
./make
sudo make install
cd .. -
Re:ZFS on Linux?
http://zfsonlinux.org/faq.html
google can be your friend
-
Re:Fatal flaw: Filesystems = 4TB only.
> if they could port ZFS from FreeBSD they'd have a winner on their hands
What are you talking about?
* http://wiki.netbsd.org/users/haad/porting_zfs/
* http://netbsd-soc.sourceforge.net/projects/zfs-port/Considering FreeNAS is based on TinyBSD, and ZFS is already available for Linux,
http://zfsonlinux.org/
Not sure what issues you are having with NetBSD & ZFS.ZFS for Linux was dead easy to get up and running
...
1. Download spl
2. Download zfs
3. ./configure ; make
4. zpool import /dev/...Just pulled in 4x 1.5 TB drives in a 2.3 TB Raid-Z2 pool with ZFSonLinux that had already been setup in FreeNAS.
-
Re:What is "GNU/Linux?"
ZFS-on-Linux
Is slow and will always be slow while running from user land.
ZFS-on-Linux is not the userland FUSE version of ZFS. it's a kernel module port of ZFS and a portability layer ("spl" or solaris portability layer) by a team led by by Brian Behlendorf at LLNL (mostly because LLNL want to run Lustre on top of it).
It's not slow at all. It's fast.
and it's almost 10 years ahead of btrfs in terms of features, development time, and real-world testing and use in production servers.
ZFS's license is CDDL - free software but incompatible with the GPL. This means that distros can't distribute linux+zfs as a combined work (or should at least be very wary of doing so), but there's no legal problem at all with distributing the linux kernel and a separate zfs-dkms package or similar that automates the compilation and installion of ZFS on the end-user's own system.
installing it is slightly more hassle than using ext4 or xfs or btrfs or some other in-mainline-kernel FS, but not significantly more difficult. no harder than, say, installing the non-free nvidia or fglrx kernel modules on most distros.
-
Re:ZFS
debian already distributes zfs-fuse in the main archive.
there's no legal impediment to debian also distributing zfsonlinux as zfs-dkms and spl-dkms kernel module packages (i.e. compiles and installs the
.ko modules when you install it) and zfs tools packages.in fact, that's how the zfsonlinux project distributes for debian - as an apt-gettable repository, so installing it is as easy as adding another repo and running apt-get
http://zfsonlinux.org/debian.html
(BTW, that repo works on Wheezy and on Sid)
In both cases, zfs-fuse and zfsonlinux, they're not distributing a derived work because it is the USER who installs it on their own system who is combining the GPL code and the CDDL code. As long as they don't distribute the result themselves, the GPL is fine with that.
You can do whatever you like (including link incompatibly-licensed, even proprietary, code) with GPLed code on your own system. The GPL's restrictions only come into effect when you want to distribute the combined work (or, in the case of GPLv3, if you want to offer it as SaaS to third-parties)
-
ZFS
Meanwhile ZFS announced that it was ready for production last month.
-
Re:Google should have bought Sun
ZFS is open source. Turns out it sucks on inexpensive (x86 quality) hardware, in ways that are unfixable even by smart people. Apple lost years proving that.
But Oracle's BTRFS plays the same role and is even better.
-
Re:FreeBSD 9.1 Is Unix Heaven
FreeBSD kernel: perhaps. It's userland, though... What I remember about IRIX was nicer to use than current BSD, and that was aeons ago. I have no need for BSD at the moment, but if I did, it'd be a toss-up between Debian/kFreeBSD and unstable hacks.
-
Re:non-Oracle ZFS FTW
Unless you're playing tricks with shims and wrappers, such as by running ZFS in userspace somehow, or forcing end users to do all the work of setting up ZFS rather than making it quick and easy to set up, you're probably violating the CDDL and GPL by distributing ZFS with a Linux distribution.
The official position is that the license conflict just means you can't compile it into the kernel, not that you can't publish it as a kernel module.
I acknowledge that there is some controversy over whether kernel modules are considered derivative works, but the fact that proprietary drivers do exist and are often available in the non-free sections of repositories contradicts the idea that the licensing issue alone is enough to stop it. Furthermore, Linus' opinion on the matter seems to be that modules developed for other OSes which are then ported to Linux should not be considered derivative works.
But one gray area in particular is something like a driver that was originally written for another operating system (ie clearly not a derived work of Linux in origin). At exactly what point does it become a derived work of the kernel (and thus fall under the GPL)?
THAT is a gray area, and _that_ is the area where I personally believe that some modules may be considered to not be derived works simply because they weren't designed for Linux and don't depend on any special Linux behaviour.
Basically: - anything that was written with Linux in mind (whether it then _also_ works on other operating systems or not) is clearly partially a derived work. - anything that has knowledge of and plays with fundamental internal Linux behaviour is clearly a derived work. If you need to muck around with core code, you're derived, no question about it.
Historically, there's been things like the original Andrew filesystem module: a standard filesystem that really wasn't written for Linux in the first place, and just implements a UNIX filesystem. Is that derived just because it got ported to Linux that had a reasonably similar VFS interface to what other UNIXes did? Personally, I didn't feel that I could make that judgment call. Maybe it was, maybe it wasn't, but it clearly is a gray area.
Personally, I think that case wasn't a derived work, and I was willing to tell the AFS guys so.
Does that mean that any kernel module is automatically not a derived work? HELL NO! It has nothing to do with modules per se, except that non-modules clearly are derived works (if they are so central to the kenrel that you can't load them as a module, they are clearly derived works just by virtue of being very intimate - and because the GPL expressly mentions linking).
So being a module is not a sign of not being a derived work. It's just one sign that _maybe_ it might have other arguments for why it isn't derived.
Linus
---http://kerneltrap.org/node/1735So legally, there aren't any issues with running ZFS under Linux, or even distributing binary kernel modules for it. Legally there's no distinction based on the relative difficulty of installation, it's merely a question of whether it's compiled into the kernel or not.
-
Re:Can we get a real Linux filesystem, please?
I have seen the userlevel ZFS crash multiple times, it's also slow as hell. It's still worth it if you are short on storage and want to reduce the size of your backup, but I wouldn't exactly call it ready for production.
I think parent is talking about this, not the userlevel FUSE-based ZFS:
http://zfsonlinux.org/ -
Re:Can we get a real Linux filesystem, please?
ZFS on Linux does exist as a kernel module that is pretty stable and works well. http://zfsonlinux.org/ -- it was put out by Lawrence Livermore National Lab, but can't be included with the kernel distros due to GPL / CDDL license compatability issues.
-
Re:Reinventing the wheel
There's a native kernel port of ZFS for Linux: http://zfsonlinux.org/
-
Linux and ZFS licensing
The ZFS filesystem is a robust, modern filesystem originially developed on Sun Solaris that contains many advanced features and is being used (among other things) on the largest computer in the world, LLNL Sequoia, which is running Linux.
ZFS is licensed under the Sun CDDL, which is an OSS-approved license. As ZFS was originally developed for Solaris, it is not a derived work of Linux or other GPL software. There is little hope of getting the ZFS copyright owner (Oracle) to relicense it under GPL. Since open source software is intended to increase users' freedom instead of restrict it, there is still a broader community of users would like to make ZFS on Linux available to the masses as part of easily-used Linux distributions.
It seems relatively clear that combining ZFS and Linux source code and compiling it on your own is permissible under the GPL if one does not redistribute the combined work, but there is uncertainty about whether it is legally safe to distribute ZFS and the Linux kernel together in either source or binary form.
Unlike issues with binary kernel modules that have proprietary licenses and/or are closed source, in this case the ZFS code is open source and has none of the objections that traditionally surround binary kernel modules, and it is in fact the GPL license that prevents distributing two open source components together if they do not both use the GPL license.
Under some interpretation of the GPL, ZFS is an independent work and can function on its own without the Linux kernel (there is a userspace component that can be used to run regression tests on the code independent of any kernel), but the Linux-compiled ZFS kernel module itself is not useful to users without the kernel.
Would you consider distributing ZFS binary Linux kernel modules (together or separately from the kernel) a violation of the GPL? Would the binary ZFS kernel module be considered "not based" on the Linux kernel per the GPLv2 section 2, the last paragraph that allows "mere aggregation" of another work packaged independently on the same media or download site? Would it be permissible if the ZFS code were distributed as a source package together with the binary kernel and compiled on the end-user system at installation time? Failing that, would the FSF be willing to make a special exemption to the GPL to allow ZFS to be bundled with Linux?
-
Re:ZFS Support?
no, but you can load it into your Ubuntu or Mint, or compile it yourself:
-
Re:Huh?
Yes. Yes. Yes. Yes. Yes. Yes. Yes. Yes. Yes. It performs fairly well in my testing so far. Yes. Yes. Yes, if the pool version is below the currently supported Linux port's version (28). Yes. Yes.
Granted, we haven't been using it long, but so far it's been fairly stable and capable.
-
Re:OK Howto article, but missing key points
I just looked at this article as my employer uses Debian and Ubuntu heavily and I've been pushing for ZFS on our file servers. There is no mention of ZFS version, the feature set available, or even a link to the source material.
ZoL is based on ZFS version 28 from the last open Solaris release, and currently integrating Illumos as its upstream.
There isn't much mention of how to use ZFS. I happen to know most commands, but I think this article would be difficult for a beginner even though it seems to be targeted at that demographic.
It looks like the Slashdot editors are doing this blogger a favor by linking to a mostly empty article.
At a minimum, this article should link to the ZoL home page, the ZoL Launchpad page for packages, and maybe the ZFS introduction or another tutorial.
-
Re:Then what file system should we all use?
Granted, you have to run FreeBSD or Solaris to access a ZFS pool locally.
I've been setting up servers using native ZFS on Linux for almost a year now. Previously I've used Nexenta (Debianized OpenSolaris) to get ZFS.
-
Re:Btrfs
1. btrfs is experimental. corruption and data-loss is not surprising (or even a valid cause of complaint) for a fs tagged as being experimental.
2. btrfs is far from the only fs available for linux. the default ext4 is quite reliable. as is XFS.
3. Native (in-kernel, not FUSE) ZFS is available for linux. Will probably never be in mainline kernel due to licensing issues (CDDL vs GPL) unless Oracle relicenses ZFS as BSD or similar (making it GPL would make it incompatible with *BSD and opensolaris etc)
works well.
The Ubuntu PPA packages easily re-compiled for debian (just change the Depends line from zfs-grub to grub)...builds nice dkms module packages, so maintainence is almost as hassle-free as if it were in the mainline kernel.
-
Re:OpenSolaris but not FreeBSD?
For example, imagine I had a block size of two bytes, and had the following two files:
02468A
012468A
These two pieces of data are identical except for one inserted byte, but deduplication will be forced to store both files in their entirety.What you are getting at is dictionary-based compression. You search for repeating sequences of bytes (or bits), then substitute their byte-representation with a shorter one. The longest sequences which show up most frequently get substituted to the shortest byte-representations, thus resulting in space savings (compression). Short but infrequent sequences (e.g. if the letter 'Z' only shows up once) get mapped to a longer byte-representation; so there's data expansion going on there. But the net effect of the two is compression (except possibly in data which has already been compressed).
Since that's just compression, you can achieve it by running your tar file through lzip or compress (bzip2 uses Huffman encoding, which is similar but different). In that respect, the difference between compression and deduplication is not that there are some things deduplication cannot do which compression can. It's that deduplication happens at a different level in the filesystem. Deduplication happens between different files (or blocks); compression happens within a single file, and compressed archive formats like zip and tgz do both.
If you were able to store your entire filesystem in a single compressed tar file, and run disk I/O by dynamically decompressing just the portions of the tar file which had the data you needed, you'd be getting the best of deduplication and compression. Viewed this way, deduplication is just a form of run length encoding compression between files rather than within a file. Instead of compressing the sequence AAA down to 3A, you compress 3 identical copies of File down to 3File.
Getting back on topic, I have the Linux version of ZFS running on my file server (which gets its own backups so I'm not too worried about data loss). I had to turn deduplication off though because of performance problems. Reads were fine, but writes to my RAID-Z went from about 120 MB/s down to below 20 MB/s. And more importantly, it didn't mix well with Samba causing writes to the file server from Windows boxes to frequently time out. This was on an i5 w/ 8GB of RAM, so the hardware was more than up to the task. The files I store don't deduplicate much anyway, so it was easier just to turn it off than to try to figure out the cause. (I tried FreeNAS too, but I wanted to be able to do other stuff with the fileserver, not just have it serve files.) -
some ideas for you
you can buy a 20 or 24 bay case for around $300-$400 US, e.g. Norco RPC-4020 or RPC-4224. Takes up to a full size EEB 12"x13" motherboard and 20 or 24 3.5" hot-swap SAS/SATA drives. Can take a standard power supply or there are redundant dual power supplies available.
http://www.norcotek.com/item_detail.php?categoryid=1&modelno=RPC-4020
http://www.norcotek.com/item_detail.php?categoryid=1&modelno=RPC-4224The 24 port version has a nice option to replace the internal fan bracket (which supports 4 x 80mm fans) with a bracket that supports 3 x 120mm fans. Much quieter for a home environment. Dunno if the RPC-4020 has a similar option. You *WILL* want to replace all of the supplied fans with third-party silent fans. http://www.silentpcreview.com/ is a good place to start researching this.
Even if you're only planning to have 10 or less drives right now, the extra bays are useful if/when you need to replace or upgrade existing drives. You won't have to juggle drives in and out of bays just to replace them. or have a drive hanging outside the case for a few hours while the data is copied.
For extra SATA ports, there are several models of LSI 9211 and similar HBA adaptors providing SAS/SATA 6Gpbs, PCI-e 8x slot. RRP is around $350 for 8 port models but you can find them cheaper on ebay, and several manufacturers (e.g. the IBM M1015) have significantly cheaper rebadged models. A SAS card allows you to use either or both SAS and SATA drives, and also allows you to use SAS expanders (to attach more drives to the one card - SATA has something similar called "port multipliers" but it's a crappy substitute only good for destroying your data). Unless you don't have enough PCI-e 8x slots in your m/b, though, you're better off just buying more 8 port cards.
They're just "dumb" HBAs offering only RAID-0, RAID-1, and JBOD....but that's exactly what you want for software raid or btrfs or ZFS so why pay extra for RAID-5 in the card that you're never going to use.
The LSI 1068 based cards are even cheaper, but they only support SAS/SATA 3Gbps. Doesn't matter much for current hard disks, but you'll need a few 6Gbps ports on the motherboard if you want to use SSD drives (e.g. for caching.)
here's a good starting point: http://blog.zorinaq.com/?e=10
see also http://forums.servethehome.com/showthread.php?19-LSI-RAID-Controller-HBA-Equivalency-Mapping
For the file system, I very strongly recommend ZFS On Linux (the native kernel implementation, not the ZFS-Fuse module). http://zfsonlinux.org/ - gives you raid-like features, disk/volume management, compression, de-duping, snapshots, ssd caching and more. all data is checksummed too so it can detect errors (and automatically repair them from redundant info on the RAID1/5-like volumes).
The Ubuntu PPA compiles easily on debian (you only have to change one dependancy from zfs-grub to grub in the debian/control file) - it's about 10 minutes work, and most of that is waiting for the packages to compile.
ZFS will give you software-raid like capabilities - superior equivs to RAID-0, RAID-1, and RAID-5/6 and combinations of them, plus multiple optional hot and cold spares. "superior" because the redundancy is on the file/data level, not at the block level, and each block of each file is checksummed. Plus you can use one or more fast devices like an SSD for automatic read caching of frequently access data (ZFS cache or L2ARC), and for a write-intent log (ZFS ZIL) for buffering random-writes to an SSD before writing them to the main drives. This ZIL eliminates the final advantage that hardware raid cards had
-
Re:File System
since there's no fast ZFS implementation for Linux yet
Ha Ha! http://zfsonlinux.org/
There you go. Fully native ZFS on Linux. No more FUSE.
-
Re:When do we get compression?
The one thing that baffles my mind is that Linux filesystems still don't offer compression of specific folders or files. Seriously, Windows has had this for over a decade.
It sounds like you want ZFS. ZFS has supported compression for a long time LZJB compression since early on, GZIP compression since pool version 5, ZLE compression since pool version 20...
The only problem is.... well, on Linux it's mostly available only using FUSE. There is the ZFS On linux port mentioned, but I suppose that's really not "production quality" yet.
Compression seems to be one of those things that Solaris excels at.
Of course this is useless if your storage hardware is low-end or you lack CPU, since while compression increases capacity, it doesn't increase how much of that storage you can have live data on at average activity
-
Re:No ZFS?
If you want the free version you can still use v28 on FreeBSD and Solaris Express (no upgrades in over 1 year).
or linux.