ZFS Gets Built-In Deduplication

← Back to Stories (view on slashdot.org)

ZFS Gets Built-In Deduplication

Posted by ScuttleMonkey on Monday November 2, 2009 @11:21AM from the sounds-like-a-resource-hog-waiting-to-happen dept.

elREG writes to mention that Sun's ZFS now has built-in deduplication utilizing a master hash function to map duplicate blocks of data to a single block instead of storing multiples. "File-level deduplication has the lowest processing overhead but is the least efficient method. Block-level dedupe requires more processing power, and is said to be good for virtual machine images. Byte-range dedupe uses the most processing power and is ideal for small pieces of data that may be replicated and are not block-aligned, such as e-mail attachments. Sun reckons such deduplication is best done at the application level since an app would know about the data. ZFS provides block-level deduplication, using SHA256 hashing, and it maps naturally to ZFS's 256-bit block checksums. The deduplication is done inline, with ZFS assuming it's running with a multi-threaded operating system and on a server with lots of processing power. A multi-core server, in other words."

6 of 386 comments (clear)

Min score:

Reason:

Sort:

Wake me when they build it into the hard disk by icebike · 2009-11-02 11:38 · Score: 4, Interesting

Imagine he amount of stuff you could (unreliably) store on a hard disk if massive de-duplication was built into the drive electronics. It could even do this quietly in the background.
I say unreliably, because years ago we had a Novell server that used an automated compression scheme. Eventually, the drive got full anyway, and we had to migrate to a larger disk.
But since the copy operation de-compressed files on the fly we couldn't copy because any attempt to reference several large compressed files instantly consumed all remaining space on the drive. What ensued was a nightmare of copy and delete files beginning with the smallest, and working our way up to the largest. It took over a day of manual effort before we freed up enough space to mass-move the remaining files.
De-duplication is pretty much the same thing, compression by recording and eliminating duplicates. But any minor automated update of some files runs the risk of changing them such that what was a duplicate, must now be stored separately.
This could trigger a similar situation where there was suddenly not enough room to store the same amount of data that was already on the device. (For some values of "suddenly" and "already").
For archival stuff or OS components (executables, and source code etc) which virtually never change this would be great.
But there is a hell to pay somewhere down the road.

--
Sig Battery depleted. Reverting to safe mode.
1. Re:Wake me when they build it into the hard disk by icebike · 2009-11-02 12:15 · Score: 3, Interesting
  
  Bad design on Novell's part, but the problem persists in the de-duplicated world, where de-duplicating to memory only is not a solution.
  Imagine a hundred very large file containing largely the same content. Not imagine CHANGING just a few characters in each file via some automated process. Now 100 files which were actually stored as ONE file balloon to 100 large files.
  On a drive that was already full, changing just a few characters (not adding any total content) could cause a disk full error.
  You really can't fake what you don't have. You either have enough disk to store all of your data or you run the risk of hind-sight telling you it was a really bad design.
  
  --
  Sig Battery depleted. Reverting to safe mode.
Par for the course.. by Junta · 2009-11-02 13:08 · Score: 4, Interesting

Any filesystem implementing copy-on-write at all, data dedupe, and/or compression is already a strategy where the risk of exhausting oversubscribed storage due to unanticipated compression ratios or uniqueness is a risk. It's a reason why you have to be pretty explicit to NetApp filers implementing these features that you are accepting the risk of exhausting allocations if you actually make use of these features to the point of advertising more storage capacity than you actually have.
You don't even need a fancy filesystem to expose yourself to this today:
$ dd if=/dev/zero of=bigfile bs=1M seek=8191 count=1
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.00426769 s, 246 MB/s
jbjohnso@wirbelwind:~$ ls -lh bigfile
8.0G 2009-11-02 20:06 bigfile
~$ du -sh bigfile
1.0M bigfile
This possibility has been around a long file and the world hasn't melted. Essentially, if someone is using these features, they should be well aware of the risks incurred.

--
XML is like violence. If it doesn't solve the problem, use more.
There are three types of files. by Animats · 2009-11-02 13:44 · Score: 5, Interesting
I'd argue that file systems should know about and support three types of files:
- Unit files. Unit files are written once, and change only by being replaced. Most common files are unit files. Program executables, HTML files, etc. are unit files. The file system should guarantee that if you open a unit file, you will always read a consistent version; it will never change underneath a read. Unit files are replaced by opening for write, writing a new version, and closing; upon close, the new version replaces the old. In the event of a system crash during writing, the old version of the file remains. If the writing program crashes before an explicit close, the old file remains. Unit files are good candidates for unduplication via hashing. While the file is open for writing, attempts to open for reading open the old version. This should be the default mode. (This would be a big convenience; you always read a good version. Good programs try to fake this by writing a new file, then renaming it to replace the old file, but most operating systems and file systems don't support atomic multiple rename, so there's a window of vulnerability. The file system should give you that for free.)
- Log files Log files can only be appended to. UNIX supports this, with an open mode of O_APPEND. But it doesn't enforce it (you can still seek) and NFS doesn't implement it properly. Nor does Windows. Opens of a log file for reading should be guaranteed that they will always read exactly out to the last write. In the event of a system crash during writing, log files may be truncated, but must be truncated at an exact write boundary; trailing off into junk is unacceptable. Unduplication via hashing probably isn't worth the trouble.
- Managed files Managed files are random-access files managed by a database or archive program. Random access is supported. The use of open modes O_SYNC, O_EXCL, or O_DIRECT during file creation indicates a managed file. Seeks while open for write are permitted, multiple opens access the same file, and O_SYNC and O_EXCL must work as documented. Unduplication via hashing probably isn't worth the trouble and is bad for database integrity.
That's a useful way to look at files. Almost all files are "unit" files; they're written once and are never changed; they're only replaced. A relatively small number of programs and libraries use "managed" files, and they're mostly databases of one kind or another. Those are the programs that have to manage files very carefully, and those programs are usually written to be aware of concurrency and caching issues.
Unix and Linux have the right modes defined. File systems just need to use them properly.
Re:Any other file systems with that feature? by binaryspiral · 2009-11-02 14:27 · Score: 4, Interesting

Microsoft's SIS is a joke. A few folks have dedupe down to a science - Data Domain and NetApp.
We virtualized our filers into an ESX 3.5 cluster and dropped the VMDK files onto a NetApp 3140... deduped them to 18% of their original size. No performance impact, actually faster than our original servers and much more efficient.
ROI - three months.
Difficulty to implement dedup? A checkmark and the OK button.
Re:Open Source Cures Cancer by sjames · 2009-11-03 02:34 · Score: 3, Interesting

The same people you call when your proprietary system breaks and you discover that the official tech support people can't find their posterior with both hands and a map. Most cities have a number of grief councilors ready to support you in your time of need. If it was really critical, try the suicide hotline.