ZFS Gets Built-In Deduplication

← Back to Stories (view on slashdot.org)

ZFS Gets Built-In Deduplication

Posted by ScuttleMonkey on Monday November 2, 2009 @11:21AM from the sounds-like-a-resource-hog-waiting-to-happen dept.

elREG writes to mention that Sun's ZFS now has built-in deduplication utilizing a master hash function to map duplicate blocks of data to a single block instead of storing multiples. "File-level deduplication has the lowest processing overhead but is the least efficient method. Block-level dedupe requires more processing power, and is said to be good for virtual machine images. Byte-range dedupe uses the most processing power and is ideal for small pieces of data that may be replicated and are not block-aligned, such as e-mail attachments. Sun reckons such deduplication is best done at the application level since an app would know about the data. ZFS provides block-level deduplication, using SHA256 hashing, and it maps naturally to ZFS's 256-bit block checksums. The deduplication is done inline, with ZFS assuming it's running with a multi-threaded operating system and on a server with lots of processing power. A multi-core server, in other words."

4 of 386 comments (clear)

Min score:

Reason:

Sort:

More reason to be a ZFS fanboy by BitZtream · 2009-11-02 11:32 · Score: 3, Insightful

I'm wondering how long its going to take for them to do something with ZFS that actually makes me slow down my overwhelming ZFS fanboyism.
I just love these guys.
My virtual machine NFS server is going to have to get this as soon as FBSD imports it, and I'll no longer have to worry about having backup software (like BackupPC, good stuff btw) that does this.
I don't use high end SANs but it would seem to me that they are rapidly losing any particular advantage to a Solaris or FBSD file server.

--
Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
Re:This is good news... by bcmm · 2009-11-02 11:35 · Score: 4, Insightful

...and would normally make me happy; except I'm a Mac user. Still good news, but could've been better for a certain sub-set of the population, darn it.
Use open source, get cutting edge things.

--
# cat /dev/mem | strings | grep -i llama
Damn, my RAM is full of llamas.
Re:Does that mean... by ezzzD55J · 2009-11-02 11:46 · Score: 3, Insightful

The single block is still stored redundantly, of course. Just not redundantly more than once.
Re:There are three types of files. by greg1104 · 2009-11-02 17:14 · Score: 3, Insightful

The main corner case in your suggested "unit file" implementation is where someone is overwriting a file too large for the filesystem to contain two copies of it. You have to truncate when this happens to fit the new one, you can't just keep the old one around until it's replaced. This makes it impossible to meet the spec you're asking for in all cases. The best you can do is try to keep the original around until disk space runs out, and only truncate it when forced to. However, if that's how the implementation works, then applications can't just blindly rely on the filesystem to always do the right thing and "give you that for free". They've still got to create the new file and confirm it got written out before they touch the original if they want to guarantee never losing the original good copy, so that they bomb with a disk space error rather than risk truncating the original. That's why this whole path doesn't go anywhere useful; better to work on poplarizing an API for atomic rewrites or something.
As for your "managed files" case, that won't work for all database approaches. For example, in PostgreSQL, only writes to the database write-ahead log are done with O_SYNC/O_DIRECT. The main data block updates (and writes that are creating new data blocks) are written out asynchronously, and then when internal checkpoints reach their end any unwritten blocks are forced to disk with fsync if they're still in the OS cache. You'd be hard pressed to detect which of your suggested modes was the appropriate one for just the obvious behavior there, and there's still more weird corner cases to worry about buried in there too (like what the database does with the data blocks and the WAL to repair corruption after a crash).
Both these highlight that it's hard to make improvements here at just the filesystem level. Some of the really desirable behavior is hard to do unless applications are modified to do something different too. That hasn't really been going well for ext4 this year, and how that played out highlights how hard an issue this is to crack.