Data Deduplication Comparative Review

← Back to Stories (view on slashdot.org)

Data Deduplication Comparative Review

Posted by samzenpus on Wednesday September 15, 2010 @11:10AM from the a-little-order-please dept.

snydeq writes "InfoWorld's Keith Schultz provides an in-depth comparative review of four data deduplication appliances to vet how well the technology stacks up against the rising glut of information in today's datacenters. 'Data deduplication is the process of analyzing blocks or segments of data on a storage medium and finding duplicate patterns. By removing the duplicate patterns and replacing them with much smaller placeholders, overall storage needs can be greatly reduced. This becomes very important when IT has to plan for backup and disaster recovery needs or when simply determining online storage requirements for the coming year,' Schultz writes. 'If admins can increase storage usage 20, 40, or 60 percent by removing duplicate data, that allows current storage investments to go that much further.' Under review are dedupe boxes from FalconStor, NetApp, and SpectraLogic."

5 of 195 comments (clear)

Min score:

Reason:

Sort:

Second post by Anonymous Coward · 2010-09-15 11:13 · Score: 2, Funny

Same as the first.
Re:Um.. by igny · 2010-09-15 13:33 · Score: 2, Funny

Yeah! To fight dupes I compute CRC checksum for each file and store it (and only it) on my back up drive. That method removes dupes almost automatically and there is a side effect of a huge compression ratio too. I have been downloading the high def videos from Internet for quite a while now and with my compression method I have used less than 10 percent of 1GB flash drive! I strongly recommend this method to everyone!

--
In theory there is no difference between theory and practice. In practice there is. - Yogi Berra
I already do this by MyLongNickName · 2010-09-15 13:57 · Score: 3, Funny

After an analysis of a 1TB drive, I noticed that roughly 95% were 0's with only 5% being 1's.
I was then able to compress this dramatically. I just record that there are 950M 0's and 50M 1's. The space taken up drops to around 37 bits. Throw in a few checksum bits, and I am still under eight bytes.
I am not sure what is so hard about this disaster recovery planning. Heck, I figure I am up for a promotion after I implement this.

--
See my journal for slashdot ID's by year. Mine created in 2005. http://slashdot.org/journal/289875/slashdot-ids-by-year
Re:Don't forget to weigh in the cost by zooblethorpe · 2010-09-15 14:31 · Score: 2, Funny

...so reducing sinning platters can be a bad thing.
Satan, is that you?
Cheers,

--
"What in the name of Fats Waller is that?"
"A four-foot prune."
Re:Wrong layer by StikyPad · 2010-09-16 09:20 · Score: 2, Funny

Sounds like what we need is a giant table of all possible byte values up to 2^n length, then we can just provide the index to this master table instead of the data itself. I call this proposal the storage-storage tradeoff where, in exchange for requiring large amounts of storage, we require even more storage. I'll even throw in the extra time requirements for free.

--
https://www.eff.org/https-everywhere