SCSI vs. SATA In a File Server?
turboflux asks: "I'm currently in the process of replacing an aging file server with something more robust. Company-wide, there will be about 100 people who could be using this server, but I don't imagine there being more than 50 concurrent users. Right now, I'm torn between spending alot on SCSI hardware, much like our other servers, or spending less, but getting more space, with SATA II drives. Whatever I decide, the server will be setup with a RAID 1+0 array for the numerous benefits it offers. Does Slashdot have opinions or suggestions on performance, reliability, and stability?"
SATA is fast and cheap; just make sure you spend a little bit more to get the "nearline" storage drives and not just desktop drives. Put them behind a 3ware 9550 and you'll fly.
Fellowship 9/11
Dude, that thinking is the difference between an SA and an Engineer. Thinking like that would have us all running MFM drives, and these newfangled "SCSI" disks would be too risky random equipment to test out on a server.
-- "In order to have power, I must be taken seriously." -Mojo Jojo
The very definition of RAID is "Redundant Array of INEXPENSIVE Disks". Emphasis mine.
I've already read a bunch of posts about how SCSI is more reliable than SATA. Well, they actually mean SCSI drives are generally more reliable than SATA drivers (and some actually say so). They're quite correct for the most part.
Here's what storage vendors don't want you to know: It doesn't matter.
Use RAID. With SCSI or FC disks, you'll have to use RAID5. At that point, two disk failures in a given array and you're screwed. You REALLY care that two disks don't fail at the same time. And when you're using low-end or even mid-range drives, it happens.
Why do you have to use RAID5? Because with SCSI or FC disks, RAID5 is the only economical option. With a 300GB SCSI drive going for at least $1200USD, and FC drives of that size going for $2500USD, even the biggest corporations end up using RAID5.
Of course, RAID5 isn't the only level of RAID. It's the least redundant of any level of RAID, as a matter of fact.
Go SATA with RAID10, at least 4 drives, ideally six or more. With six drives, the likelyhood of having two drives fail before you can replace the first one is somewhat higher than if you're using SCSI, but the likelyhood of that second drive causing you data loss due to a failed array is infinitesimally smaller. It's guaranteed with RAID5, and the chance for RAID10 is inversely proportional to the number of disks in the array. So first the first drive has to fail, then the second drive which fails has to be of the same RAID1 set. Add onto that that drives do indeed "go old", and the heavier you work them, the faster they get old. With RAID5, disks tend to get worked a lot harder (without any cache, or if the cache misses, each write requires n-2 reads, and 2 writes).
Of course, you've pretty much decided that RAID10 is the way to go. At that point it's cost. If you're looking for 50GB of fast redundant storage, SCSI is going to be slightly cheaper. If you need any amount of storage though, SATA is going to be a whole lot cheaper for the same level of reliability (which requires more spindles), and typically better speed (more spindles means more seeks per second and more megs per second, though one needs to be mindful that big SATA disks are only 7200RPM, while the slowest SCSI disks you're going to get are 10kRPM).
Summary? I'm value-concious. I'd go the SATA route. RAID10, four disks minimum to start, a pair of 4-port 3ware SATA cards with 128MB+ of battery-backed cache. I'd do the RAID entirely with software (Linux MD), with each RAID1 set split across two controllers. We get cheap disk redundancy, cheap disk speed, cheap I/Os, and cheap controller redundancy. I'd consider using less fancy controllers, the 3ware jobbies tend to be expensive, but when you're doing big writes the cache makes a massive difference (75MB/s across four disks of RAID10 versus 20MB/s). I've considered putting together a dedicated storage appliance, exporting via SMB/NFS/NBD/GFS/what-have-you, without the battery-backed cache, but with a pair of 1U UPS units (one for each power supply). Then I'd go around turning off all the application-level fsync()ing, and see what happens with 4GB of disk cache. Bet it'd be fast. And with shutdown initiated via UPS trigger, almost as safe as a battery-backed cache. Remember: "Redundant Array of INEXPENSIVE Disks."
God I ramble.
Barclay family motto:
Aut agere aut mori.
(Either action or death.)
But if you may need the data 5 years or more from now, tape is clearly far superior.
You have much luck getting data back from a tape five years later?
First you have to find the tape. You can't have misplaced it and you can't have reused it due to the damn high cost of magnetic tape.
Then you have to find a drive that can read the tape. The one you wrote it with died two years ago, its no longer manufactured and oh darn none of the three you picked up off ebay use the same compression format.
Next you need the old backup software. You've been using Acme Archiver for the past three years; It doesn't understand the old SuperBackup format and unfortunately SuperBackup only ran in DOS with an 8-bit ISA SCSI card.
Finally you have to pray that the tape is still good. They're like floppy disks; they go bad just sitting on the shelf.
Buddy, I've been there. It ain't pretty. So for the last 7 years I've stored my backups on hard disks. No pain! No pain!
Moderating "-1, Disagree" is simple censorship. Have the guts to post your opinion.
50 concurrent users is a LOT. You may not really mean concurrent, as in "50 people actually reading from or writing to this drive at the same time". If you DO mean that, you desperately need SCSI, the fastest you can find. You'll need seek time more than anything else; the drives need to respond as fast as possible to multiplexed requests for data. Rotation speed, which improves seek time and transfer rate, is good too, but it's seek time that's most crucial in heavy multitasking environments. If by 'concurrent' you mean '50 people occasionally hitting the disk', then yeah, you could probably do SATA.
However, you already have SCSI. Management is used to paying for SCSI machines. If you have 50-100 people depending on something, and it's slow, that's a productivity drag. If you assume that all those people cost $100k/year each (not at all unreasonable with benefits), 50 people are getting paid about 2,500 bucks an hour, or about 20,000 dollars a day. In other words, if you speed them up by just 5% with better hardware, you're saving the company a thousand dollars a day. Even if it's a tiny 1% speed gain, that's still 200 bucks a day. Saving six grand a month for an upfront investment of ten grand is a total no brainer.
Buy SCSI.
No, choose RAID 5 instead of RAID 1+0. Here is why:
To give you a datapoint, I have set up multiple Linux software RAID 5 arrays on various servers with 10+ SATA disks, and the I/O throughput is over 500+ MB/s (enough to saturate 2 full-duplex GigE links !). At my previous work we had about 200 servers, all using Linux software RAID 5. And we have been MUCH MORE HAPPY than the previous setup where all of them were using hardware RAID 5. Moreover, Linux's software RAID 5 is more flexible (create arrays on ANY disk on ANY SCSI/SATA card in the system), more consistant (one and only one control software to learn: mdadm(8), no need to use crappy vendor tools or reboot into vendor BIOSes), cheaper (no hardware to buy), more reliable (no hardware card = 1 less hw component that can fail), easier to troubleshoot (plug the disks on ANY linux server and it works, no reliance on any particular hw card) and more scalable (spread the load across multiple disk controllers, multiple PCI-X/PCIe busses, or even multiple SAN devices).
It's amazing the amount of misinformation and misconceptions about RAID that is spread around the world. I hate to say it but 95% of IT engineers don't make good choices regarding RAID servers because of all those misconceptions.