SAN, NAS, Cost and Benefits?
luetin asks: "Our company is at the point where our storage and backup infrastructure is ok, but not for much longer.
We are looking into SAN, NAS, and
variations
thereof. We are a small IT department, with two sysadmins and two programmers. Right now we have stored/circulating about 2TB of data, and that's going to increase steadily in coming years.
Does Slashdot have experience setting up SANs? Tales of costs and benefits of SANs versus a gaggle of NAS? Can SAN be implemented by reasonably seasoned IT people, or is it too dark an art?"
I'd have to go against the well-funded flow here.
Right now you can get 3TB+ of storage in a single SATA RAID5 unit from www.acnc.com, for about $11,000.
You can get it with a SCSI or FC external interface. Use two of them hooked to two computers in two locations (preferably 300+ miles away) with rdiff-backup if you want extra redundancy. We use local and remote mirrors for maximum protection. The space is so cheap, it's easy to keep extra mirrors.
We've finally eliminated our last major SCSI and FC arrays, and I couldn't be happier. We're up to about 6 TB total ATA and SATA storage now. Get cheap storage if at all possible, because it will be obselete in need something that a cheaper system can't offer. That isn't much these days, now that 10K rpm SATA drives are out.
As far as single drive reliability, the first ATA unit we installed has been in service 2 years this month. We've only replaced two drives out of 48, and even then, the drives passed the factory recertification tests from the manufacturer when we ran them on them. And even if you think that's a higher failure rate than your experience with SCSI/FC, keep in mind that the cost is so much lower, it lets you have more mirroring redundancy, so individual drive failures are much less of an incident.
I've had enough abrasive sigs. Kittens are cute and fuzzy.
1. What is the use of this data? Who accesses it? How many concurent users? What type of transfer rate? Will you need funky network cards (for example 10GB nics which may not work in some solutions)
2. Can you accept downtime? If not, how much redundancy do you need? How fast can you get replacement parts?
3. Do you need specialized apps running on the machine (such as virus checkers, management tools, etc)?
For a professional installation, I would say you would at least want to ensure some redundancy. For example, an hp Proliant DL-360 G2 or G3 with redundant power supplies, redundant fans and a drive array or two. The server itself is fairly cheap, what will cost you money is all those drives you will need to buy. However it's a sturdy box.
I don't mean to single out hp, you can look for other alternatives as well. We do run an hp/compaq shop, and I am familiar with them.
All this redundancy helps in ways you don't expect. For example tonight I was able to move the server from one rack to another without losing service.. I disconnected one power supply and connected it to the new rack, then disconnected one network cable (the 2 onboard NIcs were teamed) and rerouted it.. dropped the other nic and cable, mounted the server in the rack and connected the remaining cables.. The users at the other end had no idea anything happened.
This may not sound like a big deal to many, but for us to schedule a 30-minute shutdown of a critical server requires up to a month advance notice.
You could of course accomplish the same thing using a cluster setup, but not without some major headaches. Clusters are cool on paper but for most users the bang-to-headache ratio is too low to justify it.
You're on the extremely low end of where a SAN becomes practical...
I can second that. In our company we have 2 EMCs (50km apart, 1 was free, leftover from somewhere), but the switches and FC cards took a surprisingly large amount of money. And of course the disks are expensive too. No way to put in usual SCSI disks. It all still makes sense in our case, because we gain flexibility: adding some GB to that server, take some way here, change RAID levels (not a 1-step process though), copy the data to the remote data center automatically, everything is always mirrored, and we can make snapshots of servers when we do larger upgrades so we can back out quickly. And all done from your desk (in our case: from someone's desk in another city, who manages all those SAN systems centrally).
The price we pay are: money and a high level of training and practice. Don't expect to just configure an EMC or a FC fabric without either good traning and/or lots of practice. You can screw up 100 servers in one go by misconfiguring the FC switches. All at once.
Compared that with a 2TB box acting as a NFS/CIFS server: as Unix admin it's easy to administrate, and while it does not scale to server 100 DB servers, you can always get a second box for the price of 2 EMC disks (they are that expensive). Get a second (or third/forth) one some km away, replicate them either with a RAID-1 in software if performance allows this, or with rsync, and voila, you have a nice, easy to administrate file server. My advise for this type of storage: build that yourself. Don't buy cheap junk though as lots of people depend on it running 24h a day. Make it some kind of redundant.
If however you need instant replication (like EMC offers with guaranteed max. 1 outstanding I/O operation to a second EMC box), high-throughput (32GB cache to buffer nearly all RAID-5 penalties), databases and shared medium (read: FC, clusters), then companies like EMC & Co will sell you small cut-down stuff too if you cannot afford their large storage systems.
If you don't need the type of flexibility you need a SAN offers, don't use it. It's overkill, expensive and complicated. But it's interresting technology.
Every machine directly connected to the SAN will need a fibre channel HBA. If you have more than a couple machines, you may need to get a fibre channel switch. The FC gear is pretty expensive.
Also consider diagnostics - you're no longer monitoring an Ethernet network - how are you going to track down problems in the storage network?
Counterexample:
A coworker built a Linux machine, with a simple RAID setup using an "IDE splitter" card to mirror the two disks. It ran Samba, and was used as the CAD/CAE archive for the Electronic Design Automation department.
Two years after he left that company, I asked a friend in the IT department how well that server was working. "Oh, it's great, we just reboot it once every few months". Unlike the proprietary massive RAID box (>8U rack space) from a (fairly)wellknown company, which had 27 or so disks, requiring the rebuilding or replacement of one (expensive, provided "free" with the maintenance fee) disk every few months.
10000 hours MTBF, with 27 disks, works out to one failure every few months on average. I wish I could find that website that explained the math to determine the MTBF of multiple critical items, which is not equal the MTBF of one of them. Once I found it, I knew why that server was never going to seem as reliable as the two-disk Linux Samba machine.
The problem with the Samba machine? Low bandwidth compared to the 4 network connections on the massive RAID box. But that wasn't really a problem in this application, since the archives weren't read that often.
The key is to match the solution with the problem.