aside from the mentioned 'to reduce duplicate data to increase available storage space' are there any other benefits to de-duplicating your storage?
An intelligent caching layer will only store the deduped data once, allowing it to cache more data, get more cache its, reduce physical disk IO and improving response times.
Personally, I'd be _terrified_ of using dedup for primary storage. What this does is exactly the opposite of RAID - it squeezes every last bit of redundancy out of your data, and makes everything dependent upon the integrity of your blockpool database. Loose a single blocklet and you stand to lose _all_ of your data.
If you're striving for availability by keeping multiple copies of the same data on the same physical device(s), You're Doing It (Very) Wrong.
More disk is still so much cheaper it really cannot be justified on that front.
Sure it can, easily.
If your primary concern is up-front cost, you shouldn't be buying equipment in an enterprise environment. The up-front cost is the _least_ of your concerns.
You can just have nagios monitoring for errors and even order a drive off amazon if you really wanted.
Not even touching on all the things that could go wrong with this (and there are many), the best response time you're going to be looking at for this is ~12 hours, and that's only in ideal circumstances.
NetApp will have a replacement drive on your doorstep in 4-8 hours, often less.
The shiny new NetApp appliance that my PHB decided to blow the last of our budget on saves around 30% by using de-dupe, however we could have had 3 times conventional storage for the same cost.
Where are you going to get three times as much storage for the same cost (well, actually it'd need to be a lot less to pay for all the additional physical and logical infrastructure) that has redundant controllers, FC, iSCSI, NFS, SMB, no-impact snapshotting, dedupe, replication and 24x7x4 support ?
Wouldn't a compressed filesystem already do this? They don't just get the compression from nowhere. They eliminate duplicates blocks and empty space. You don't just get compression from nowhere.
No, because compression is limited to a single dataset. Deduplication can act across multiple datasets (assuming they're all on the same underlying block device).
Consider an example with 4 identical files of 10MB in 4 different locations on a drive, that cat be compressed at 50%.
"Logical" space used is 40MB.
Using compression, they will fit into 20MB.
Using dedupe, they will fit somewhere in between 5MB and 10MB.
Using dedupe and compression, they will fit into ~5MB (probably a bit less).
It doesn't really negate the need for good housekeeping routines, nor good programming. Do you really want 100 copies of record X, or would one suffice?
Far better to let the computer do the heavy lifting, than trying to impose partial order on an inherently chaotic situation.
Not to mention that the three textbook scenarios where dedupe really shines are backups, email and virtual machines, none of which can really be helped by "better housekeeping".
You didn't hear of Sun Unified Storage yet? Controller redundancy is available out of the box on certain NASes, otherwise you can add it.
I'm well aware of Sun's offering, as I nearly bought one. They're very nice machines. However, they have about as much relevance to this discussion as an EMC Celerra, as they're a solution playing in the same ballpark as NetApp, with a pricetag to match.
They are most certainly _not_ a low-end OpenSolaris-based appliance, which was the comparison being made.
All you need is an external JBOD array your NAS appliance attaches to using a pair of SAS HBAs, with dual-ported SAS drives. Controller redundancy is a function of the attachment bits, not your management software.
This will not help you when your NAS head (ie: the controller) fails, or when you need to patch it without downtime.
Further, proper controller redundancy *is* in part a function of the management software. Keeping configurations synchronised between the controllers being just one example.
And this way your head node also doesn't have to be a single point of failure (cluster with two nodes).
Which OpenSolaris based appliances support this in a comparable way to something like a NetApp FAS or EMC Celerra ? Seamless and near-instant failover, both planned and unplanned ? Patching and OS updates without downtime ?
Many of them have these connectivity options, or they can be easily added. If you haven't seen them available, then you probably aren't looking very hard.
Which OpenSolaris based appliances can be used as an FCP target ? How many of them can saturate a 10Gb ethernet link ?
Hint: "certification" is largely a farce. Can't count the number of times i've seen products that were certified to go to each other melt, and just because 'officially' something is supported, doesn't mean support personnel won't tell you otherwise, and refuse to help, when you finally need the support.
Sorry, I've spent enough 8-hour+ stretches on P1 calls with numerous vendors (including multi-vendor co-ordination) to know this is horseshit. If you have an appropriate support contract, and a supported configuration, you're going to get helped.
Now, if you have some documented cases of a vendor refusing to help even though you had a supported configuration, with ticket numbers and the vendor name, that would be _very_ interesting.
I've also been in the other scenario, where an unsupported configuration has gone awry. In one particular example I can think of, the company lost more money during the outage just in revenue (not counting the salaries of all the people working around the problem and trying to fix it, and certainly not counting the lost future business due to customers leaving) to have nearly paid for the properly supported configuration twice over. It was quite an eye-opening experience to see how "saving" $150k or so up-front ended up costing (well) over $300k in the bigger picture.
Still, I must applaud you for not suggesting a Backblaze pod as an enterprise-class storage system like so many others on/. do.
A full size touch keyboard might be different, but that doesn't mean that your statement about the iPhone is right.
The keyboard in the iPad is essentially full size (for the alpha-numeric keys) and while it's vastly more tolerable to type on than the smaller ones in touchscreen phones, it's certainly not suitable for any sort of large-scale data entry. Slashdot posts and the like is about the extent of it, and even short stuff like that tends to be error-riddled due to the inexact nature of touch screen keyboards.
Basically, I'm struggling with what your whole argument/analogy is supposed to be. NetApp make mid-range to high-end storage systems. I'm not aware of any OpenSolaris-based appliances that would even make it into the low end of the enterprise space (except maybe for dev/test purposes). In particular, they tend to lack things like controller redundancy, FC connectivity, 10G ethernet, vendor support and certification with third party products (eg: VMware).
Or, in other words, it's difficult to see anyone seriously shopping for a solution where both a NetApp Filer and an OpenSolaris-based appliance are options. Which comes back to my original implication - why would NetApp bother going after any of these OpenSolaris-based appliance vendors, when they're playing in very different markets ?
It's hard to see where there would be any crossover between customers for mid-range and high-range Internet Browsers like Microsoft's, and tied-to-a-dead-platform, low-end Web browsers, like anything based on Netscape/Mozilla
On what basis do you judge IE to be "high range" and Mozilla to be "tied to a dead platform" ?
ZFS has been around long enough now that the flaws are known. It's not the end-all be-all of file systems and storage. And some of the flaws are pretty nasty (can't shrink a zpool, which they've been "working" on a fix since 2007 for that).
That's not really a "significant flaw" to anyone outside of the DIY space. Shrinking arrays by removing devices is extremely uncommon to the point of nonexistence in the enterprise world (in fact, I'm not sure that anyone supports it - my IBM DS4800, EMC CX3 and NetApp systems don't).
And the flexibility of layers where RAID is separate from LVM which is separate from the file system offers other advantages.
Or NetApp realizes they have smaller fish to fry.... like those companies building NAS hardware based on ZFS. No reason to go after Oracle itself, in a fight you maybe cannot win, to protect sales of your aging NAS platform, if you can go after companies using ZFS/BTRFs instead, much smaller targets with much less legal muster to defend themselves.
It's hard to see where there would be any crossover between customers for mid-range and high-end storage systems like NetApp's, and tied-to-a-dead-platform, low-end storage appliances like anything based on OpenSolaris.
Do you know how Linux and Unix systems work? One of the reasons that viruses do not infect them the same as Windows is that it requires privilege escalation to do any real damage.
Please define "real damage". Because I am well aware of "how Linux and Unix systems work" and I can't think of many things malware might want to do that it requires elevated privileges for.
And if memory serves me correctly, a user sometimes doesn't need to actually download or run something in Windows to execute malicious code. Remember the Windows shortcut exploit that was released less than a month ago? In that case, it didn't matter if the user was clueless or not.
These are called exploits. They happen on all platforms.
And, knowing this, the solution is simple: Create a separate, non-privileged account for daily use. When UAC prompts for rights escalation, the user is then forced to enter the username and password of a privileged account.
It makes no difference. People will happily type in the password (and user if required). The only difference is that one scenario takes half a second and is marginally annoying, and the other takes 2 seconds and is marginally more annoying.
I know this has been said before, but if your operating system is asking for an admin password often enough that replacing it with a mouseclick significantly improves the user experience, you're solving the wrong problem.
From the common-case scenario of a single-user desktop, the difference in terms of security between clicking a button and typing in a password is so close to zero it's irrelevant.
That's why I'd like to see Microsoft forced to assume product liability so long as they market their software to the general public on the basis of "ease of use". Either market it to "technically knowledgable users only" or pay monetary damages to anyone and everyone who suffers in any way due to security issues.
When you can define "ease of use", "technically knowledgeable" and "security issues" objectively, let us know.
aside from the mentioned 'to reduce duplicate data to increase available storage space' are there any other benefits to de-duplicating your storage?
An intelligent caching layer will only store the deduped data once, allowing it to cache more data, get more cache its, reduce physical disk IO and improving response times.
Personally, I'd be _terrified_ of using dedup for primary storage. What this does is exactly the opposite of RAID - it squeezes every last bit of redundancy out of your data, and makes everything dependent upon the integrity of your blockpool database. Loose a single blocklet and you stand to lose _all_ of your data.
If you're striving for availability by keeping multiple copies of the same data on the same physical device(s), You're Doing It (Very) Wrong.
On a non-deduplicated system if one block goes bad you lose one file, on a deduplicated system you can lose any number of files due to one bad block.
That's why you have RAID and block-level checksumming.
What scenario are you envisaging where this can happen ?
More disk is still so much cheaper it really cannot be justified on that front.
Sure it can, easily.
If your primary concern is up-front cost, you shouldn't be buying equipment in an enterprise environment. The up-front cost is the _least_ of your concerns.
You can just have nagios monitoring for errors and even order a drive off amazon if you really wanted.
Not even touching on all the things that could go wrong with this (and there are many), the best response time you're going to be looking at for this is ~12 hours, and that's only in ideal circumstances.
NetApp will have a replacement drive on your doorstep in 4-8 hours, often less.
The shiny new NetApp appliance that my PHB decided to blow the last of our budget on saves around 30% by using de-dupe, however we could have had 3 times conventional storage for the same cost.
Where are you going to get three times as much storage for the same cost (well, actually it'd need to be a lot less to pay for all the additional physical and logical infrastructure) that has redundant controllers, FC, iSCSI, NFS, SMB, no-impact snapshotting, dedupe, replication and 24x7x4 support ?
But I understood de-duplication to be not concerned with files at all. Simply blocks of data on the device.
It depends.
Simplistic dedupe schemes only operate at the file level. More advanced schemes operate at the block/cluster level.
Wouldn't a compressed filesystem already do this? They don't just get the compression from nowhere. They eliminate duplicates blocks and empty space. You don't just get compression from nowhere.
No, because compression is limited to a single dataset. Deduplication can act across multiple datasets (assuming they're all on the same underlying block device).
Consider an example with 4 identical files of 10MB in 4 different locations on a drive, that cat be compressed at 50%.
"Logical" space used is 40MB.
Using compression, they will fit into 20MB.
Using dedupe, they will fit somewhere in between 5MB and 10MB.
Using dedupe and compression, they will fit into ~5MB (probably a bit less).
It doesn't really negate the need for good housekeeping routines, nor good programming. Do you really want 100 copies of record X, or would one suffice?
Far better to let the computer do the heavy lifting, than trying to impose partial order on an inherently chaotic situation.
Not to mention that the three textbook scenarios where dedupe really shines are backups, email and virtual machines, none of which can really be helped by "better housekeeping".
Filesystems should be doing this.
No, block devices should be doing this. Then you get the benefits regardless of which filesystem you want to layer on top.
Sweet, thanks for the pointer. I was also concerned about the death of OpenSolaris but it sounds like Nexenta may be just what I want.
Nexenta is built off Open Solaris and is, therefore, also dead - though it may take longer for the thrashing to stop.
You didn't hear of Sun Unified Storage yet? Controller redundancy is available out of the box on certain NASes, otherwise you can add it.
I'm well aware of Sun's offering, as I nearly bought one. They're very nice machines. However, they have about as much relevance to this discussion as an EMC Celerra, as they're a solution playing in the same ballpark as NetApp, with a pricetag to match.
They are most certainly _not_ a low-end OpenSolaris-based appliance, which was the comparison being made.
All you need is an external JBOD array your NAS appliance attaches to using a pair of SAS HBAs, with dual-ported SAS drives. Controller redundancy is a function of the attachment bits, not your management software.
This will not help you when your NAS head (ie: the controller) fails, or when you need to patch it without downtime.
Further, proper controller redundancy *is* in part a function of the management software. Keeping configurations synchronised between the controllers being just one example.
And this way your head node also doesn't have to be a single point of failure (cluster with two nodes).
Which OpenSolaris based appliances support this in a comparable way to something like a NetApp FAS or EMC Celerra ? Seamless and near-instant failover, both planned and unplanned ? Patching and OS updates without downtime ?
Many of them have these connectivity options, or they can be easily added. If you haven't seen them available, then you probably aren't looking very hard.
Which OpenSolaris based appliances can be used as an FCP target ? How many of them can saturate a 10Gb ethernet link ?
Hint: "certification" is largely a farce. Can't count the number of times i've seen products that were certified to go to each other melt, and just because 'officially' something is supported, doesn't mean support personnel won't tell you otherwise, and refuse to help, when you finally need the support.
Sorry, I've spent enough 8-hour+ stretches on P1 calls with numerous vendors (including multi-vendor co-ordination) to know this is horseshit. If you have an appropriate support contract, and a supported configuration, you're going to get helped.
Now, if you have some documented cases of a vendor refusing to help even though you had a supported configuration, with ticket numbers and the vendor name, that would be _very_ interesting.
I've also been in the other scenario, where an unsupported configuration has gone awry. In one particular example I can think of, the company lost more money during the outage just in revenue (not counting the salaries of all the people working around the problem and trying to fix it, and certainly not counting the lost future business due to customers leaving) to have nearly paid for the properly supported configuration twice over. It was quite an eye-opening experience to see how "saving" $150k or so up-front ended up costing (well) over $300k in the bigger picture.
Still, I must applaud you for not suggesting a Backblaze pod as an enterprise-class storage system like so many others on /. do.
A full size touch keyboard might be different, but that doesn't mean that your statement about the iPhone is right.
The keyboard in the iPad is essentially full size (for the alpha-numeric keys) and while it's vastly more tolerable to type on than the smaller ones in touchscreen phones, it's certainly not suitable for any sort of large-scale data entry. Slashdot posts and the like is about the extent of it, and even short stuff like that tends to be error-riddled due to the inexact nature of touch screen keyboards.
On the other hand, if your application is going to be used by say 500 people in a local council, it's going to be about £30K worth of named users.
I think you dropped a zero - should be £300k.
Also, don't forget the annual support, which is going to run somewhere in the ballpark of £70k/yr.
Yeah, I know.
Basically, I'm struggling with what your whole argument/analogy is supposed to be. NetApp make mid-range to high-end storage systems. I'm not aware of any OpenSolaris-based appliances that would even make it into the low end of the enterprise space (except maybe for dev/test purposes). In particular, they tend to lack things like controller redundancy, FC connectivity, 10G ethernet, vendor support and certification with third party products (eg: VMware).
Or, in other words, it's difficult to see anyone seriously shopping for a solution where both a NetApp Filer and an OpenSolaris-based appliance are options. Which comes back to my original implication - why would NetApp bother going after any of these OpenSolaris-based appliance vendors, when they're playing in very different markets ?
Microsoft got where it is by committing a whole host of illegal, anti-competitive acts [...]
No, it didn't, pretty much by definition, since they couldn't have been committing "illegal, anti-competitive acts" until they were actually "there".
[...] can be legitimately said to have retarded progress over the past thirty-odd years.
How ?
They're Former Lord Bill's wet dream come true.
Commodity computing that anyone can afford ?
It's hard to see where there would be any crossover between customers for mid-range and high-range Internet Browsers like Microsoft's, and tied-to-a-dead-platform, low-end Web browsers, like anything based on Netscape/Mozilla
On what basis do you judge IE to be "high range" and Mozilla to be "tied to a dead platform" ?
ZFS has been around long enough now that the flaws are known. It's not the end-all be-all of file systems and storage. And some of the flaws are pretty nasty (can't shrink a zpool, which they've been "working" on a fix since 2007 for that).
That's not really a "significant flaw" to anyone outside of the DIY space. Shrinking arrays by removing devices is extremely uncommon to the point of nonexistence in the enterprise world (in fact, I'm not sure that anyone supports it - my IBM DS4800, EMC CX3 and NetApp systems don't).
And the flexibility of layers where RAID is separate from LVM which is separate from the file system offers other advantages.
For example ?
Or NetApp realizes they have smaller fish to fry.... like those companies building NAS hardware based on ZFS. No reason to go after Oracle itself, in a fight you maybe cannot win, to protect sales of your aging NAS platform, if you can go after companies using ZFS/BTRFs instead, much smaller targets with much less legal muster to defend themselves.
It's hard to see where there would be any crossover between customers for mid-range and high-end storage systems like NetApp's, and tied-to-a-dead-platform, low-end storage appliances like anything based on OpenSolaris.
Do you know how Linux and Unix systems work? One of the reasons that viruses do not infect them the same as Windows is that it requires privilege escalation to do any real damage.
Please define "real damage". Because I am well aware of "how Linux and Unix systems work" and I can't think of many things malware might want to do that it requires elevated privileges for.
And if memory serves me correctly, a user sometimes doesn't need to actually download or run something in Windows to execute malicious code. Remember the Windows shortcut exploit that was released less than a month ago? In that case, it didn't matter if the user was clueless or not.
These are called exploits. They happen on all platforms.
Windows 95 was supported for less than 3 years.
False. Windows 95 support ended December 31, 2001.
Windows NT was only supported for 4 years.
False. Windows NT (4.0, I assume you mean) ended June 30, 2004.
Windows 2000 was only supported for 5 years.
False. Windows 200 support ended July 13, 2010.
Windows XP has only been supported for this long because Microsoft screwed the pooch.
False. Windows XP's support lifecycle was only marginally lengthened from the one it had on the day it was released.
And, knowing this, the solution is simple: Create a separate, non-privileged account for daily use. When UAC prompts for rights escalation, the user is then forced to enter the username and password of a privileged account.
It makes no difference. People will happily type in the password (and user if required). The only difference is that one scenario takes half a second and is marginally annoying, and the other takes 2 seconds and is marginally more annoying.
I know this has been said before, but if your operating system is asking for an admin password often enough that replacing it with a mouseclick significantly improves the user experience, you're solving the wrong problem.
From the common-case scenario of a single-user desktop, the difference in terms of security between clicking a button and typing in a password is so close to zero it's irrelevant.
That would only work if you where logged in as an the admin account..
Ie: the default, which the vast, vast majority of people will be using.
Or do you do everything as root?
An "admin" in OSX is not root.
That's why I'd like to see Microsoft forced to assume product liability so long as they market their software to the general public on the basis of "ease of use". Either market it to "technically knowledgable users only" or pay monetary damages to anyone and everyone who suffers in any way due to security issues.
When you can define "ease of use", "technically knowledgeable" and "security issues" objectively, let us know.