Data Deduplication Comparative Review
snydeq writes "InfoWorld's Keith Schultz provides an in-depth comparative review of four data deduplication appliances to vet how well the technology stacks up against the rising glut of information in today's datacenters. 'Data deduplication is the process of analyzing blocks or segments of data on a storage medium and finding duplicate patterns. By removing the duplicate patterns and replacing them with much smaller placeholders, overall storage needs can be greatly reduced. This becomes very important when IT has to plan for backup and disaster recovery needs or when simply determining online storage requirements for the coming year,' Schultz writes. 'If admins can increase storage usage 20, 40, or 60 percent by removing duplicate data, that allows current storage investments to go that much further.' Under review are dedupe boxes from FalconStor, NetApp, and SpectraLogic."
The concept of "deduplicating" is nothing new - it's one of the base concepts that data compression is built upon. But not only do we have people touting this "new development" but there's even questions as to its compatibility with compression. Sheesh!
One of the primary compression methods in Zip compression (deflate) uses a 32K buffer and replaces any duplicated data with pointer / length pointing to the first example of the data. This operation is computationally lightweight and can be done on the fly by any modern computer. There's open source Zip libraries / code available on the net. Building a cheap Linux box running deflate from Zip in realtime isn't an invention, it's a cheap trick that's being touted as the solution to storage problems.