Data Deduplication Comparative Review
snydeq writes "InfoWorld's Keith Schultz provides an in-depth comparative review of four data deduplication appliances to vet how well the technology stacks up against the rising glut of information in today's datacenters. 'Data deduplication is the process of analyzing blocks or segments of data on a storage medium and finding duplicate patterns. By removing the duplicate patterns and replacing them with much smaller placeholders, overall storage needs can be greatly reduced. This becomes very important when IT has to plan for backup and disaster recovery needs or when simply determining online storage requirements for the coming year,' Schultz writes. 'If admins can increase storage usage 20, 40, or 60 percent by removing duplicate data, that allows current storage investments to go that much further.' Under review are dedupe boxes from FalconStor, NetApp, and SpectraLogic."
Filesystems should be doing this.
Give me Classic Slashdot or give me death!
Filesystems should be doing this.
The one on your desktop machine, or the primary NAS storage that you access shared data from, or the backup server that ends up getting it all anyway? You see, this is a shared database problem. If your local filesystem does this, then it has to 'share' knowledge of all the unique blocklets with every other server/filesystem that wishes to share in this compressed file space. De-duplication is a means of compression that works across many filesystems - or at least it can be, if it is properly implemented.
"Every time I see an adult on a bicycle, I no longer despair for the future of the human race." - H. G. Wells
More disk is still so much cheaper it really cannot be justified on that front. More disks also mean more IOPS, so reducing sinning platters can be a bad thing.
There are some reasons to go for it, but even with thousands of clients it may or may not be suitable for what you are doing.
Something you start to appreciate when you are called on to do a really high availability, high reliability system is to have features like this. For one thing it reduces the time it takes to get a replacement. Unless a drive fails late at night, you get one the next day. You don't have to rely on someone to notice the alert, place the order, etc. It just happens. Also, like most high end support companies, their shipping time is fairly late so even late in the day it is next day service. What arrives is the drive you need, in its caddy, ready to go.
Then there's just the fact of having someone else help monitor things. It's easy to say "Oh ya I'll watch everything important and deal with it right away," but harder to do it. I've known more than a few people who are not nearly as good at monitoring their critical system as they ought to be. A backup is not a bad thing.
You have to remember that the kind of stuff you are talking about for things like NetApps is when no downtime is ok, when no data loss is ok. You can't say "Ya a disk died and before we got a new on in another died so sorry, stuff is gone."
Not saying that your situation needs it, but there are those that do. They offer other features along those lines like redundant units, so if one fails the other continues no problem.
Basically they are for when data (and performance) is very important and you are willing to spend money for that. You put aside the tech-tough guy attitude of "I can manage it all myself," and accept that the data is that important.
Sinning platters cause original spin.
No, good department managers don't know that. Department managers in companies with bad senior management know that. Companies with competent senior management are willing to increase the budgets for departments that have shown that they are fiscally responsible, and cut the budgets or fire the department heads of others.
I am TheRaven on Soylent News