Ask Slashdot: Free/Open Deduplication Software?
First time accepted submitter ltjohhed writes "We've been using deduplication products, for backup purposes, at my company for a couple of years now (DataDomain, NetApp etc). Although they've fully satisfied the customer needs in terms of functionality, they don't come across cheap — whatever the brand. So we went looking for some free dedup software. OpenSolaris, using ZFS dedup, was there first that came to mind, but OpenSolaris' future doesn't look all that bright. Another possibility might be utilizing LessFS, if it's fully ready. What are the slashdotters favourite dedup flavour? Is there any free dedup software out there that is ready for customer deployment?" Possibly helpful is this article about SDFS, which seems to be along the right lines; the changelog appears stagnant, though, although there's some active discussion.
...includes dedupe.
There was a blog entry a while ago where on a 256MB RAM machine someone was able to dedupe 600GB down to 400GB and the performance was fine. This is much unlike ZFS which wants the entire dedupe tree in memory and requires gigs and gigs of RAM.
Check out BackupPC. Been using it for about 5 years at our company, admittedly a mostly Linux shop, with great results. Deduplication on a per-file basis, block-based transfers via the rsync protocol, and a good web-based UI (at least in terms of function). Thanks to deduplication we are getting about a 10:1 storage compression backing up servers and workstations: a total of 1.28 TB of backups in 130.88 GB of used space.
Also, perhaps the reason that Linux does not support file-system compression on the fly is because it's a horrid idea, and should never actually be used?
Ah, the "Terrible idea" objection.
This is a common objection to implementing ideas on Linux - so common, in fact, that it's successfully held Linux back for at least ten years.
Multi-master LDAP replication? Terrible idea. Remained terrible for several years after literally every commercial LDAP server on the planet supported multi-master replication, only became non-harmful when OpenLDAP started to support it in version 2.4.
Active Directory support? Such a terrible idea that it's held Samba development back by at least five years. Even now, where Windows Vista deprecates NT4-style policies and 7 deprecates NT 4 domain support altogether (which is about all you get from Samba 3); Samba 4 is considered alpha software.
Some sort of centralised work-together system that integrates email, address book, calendars, task-list? Terrible idea. So much so that Exchange (despite being way too complicated for its own good) is still an extremely popular email solution and the closest you can get to a viable F/OSS alternative either requires your users to completely re-think how they collaborate (yuck) or buy the commercial version simply because the free version lacks vital features.
Free clue to all naysayers who work on F/OSS projects: If you spent as long trying to think of ways to make something work as you do thinking of objections to existing implementations and explaining how you're right and everyone else is wrong, you wouldn't be ten years behind the times.