SCSI vs. SATA In a File Server?
turboflux asks: "I'm currently in the process of replacing an aging file server with something more robust. Company-wide, there will be about 100 people who could be using this server, but I don't imagine there being more than 50 concurrent users. Right now, I'm torn between spending alot on SCSI hardware, much like our other servers, or spending less, but getting more space, with SATA II drives. Whatever I decide, the server will be setup with a RAID 1+0 array for the numerous benefits it offers. Does Slashdot have opinions or suggestions on performance, reliability, and stability?"
We have both SCSI raid (2 1TB arrays with 10k RPM SCSI drives on a dell powervault) and a several arrays with 3ware cards (an 8 way and a 12 way both with 200 or 250GB drives). We run Red Hat WS. We find that the 3ware cards are excellent for large data storage but have latency issues compared to the SCSI raid array. We are happy with both systems, but the price break on the 3ware shows, and I wouldn't recommend for really heavy use.
Laboratree - Scientific collaboration based on OpenSocial.
I use SATA on our smaller, non-mission-critical servers. For our data backend, I wouldn't touch it with a 10-foot pole.
Here are some scenarios where I wouldn't hesitate to use SATA:
- You have redundant servers. Using LVS and/or Heartbeat and your favorite tools, you can get full server redundancy using less expensive hardware. The overall solution can be quite elegant, with hot failover. Why just cover the drives?
- Front-end cluster nodes. You have a powerful, expensive backend server (with a cheaper failover) and you use inexpensive front-end servers for serving client requests. Sounds like overkill for what you want, but with the right server load balancing technology, it can give you a scalable, fault-tolerant and damn fast solution.
- You can live with downtime. Install a server with a couple of SATA disks in a RAID configuration and hope for the best.
There might be some benefits to you to sticking to SCSI vs. SATA, it really depends on your preference. Both SCSI and SATA offload the main processor from the duties associated with reads and writes. SATA also now has optimized reading patterns just like SCSI. The only real advatages SCSI has right now are the speeds (SATA 150 (there is a newer faster one coming) vs SCSI 320). Also, most SCSI drives are desgined for 24/7 use, whereas most SATA drives are designed for desktop use. Just make sure the SATA drives you buy are made for Enterprise level operation. So it really comes down to compatability/speed vs. cheap/larger. Hope this helps!
Sabre
the SATA II spec is quite a bit different from the original SATA. SATA II adds port multiplication, hot plugging, native command queuing, external enclosures, and port selection. Also, with a theoretical peak of 3Gbps, it's twice as fast as the old SATA. here is a decent article with more explanation.
As a matter of fact, you may not be flying at all. It all depends what you're using it for. The problem with SATA is latency, and there's not much that controller is going to do about it. If you've got a server that is performing latency sensitive tasks, then SATA can cause performance problems.
In my experience, if you've got alot of random I/O, SATA is not a viable solution. That said, even if your I/O is mostly random, if there's not a heavy load on the disk, then you're probably ok. If you've got 200 people hitting a database or email server, you're probably going to have some performance problems. Swap it out with SCSI drives, or a quality disk array, and you'll be doing much better. If you've got a web server, or a database server that is exclusively reading, you can probably get away with SATA. Again, it all depends on how much and how random the disk I/O for your application is.
SATA's peak raw transfer rate (150 MB/sec) is half that of the peak raw transfer rate of SCSI (320 MB/sec), but you're going to be limited by the individual hard drive's transfer rate anyway. Keep in mind that a proper SATA implementation will be 150MB/sec PER DRIVE, since each drive is on its own channel. SCSI is 320 MB/sec per channel, but you're in for a cabling nightmare if you want only one drive per channel. Note that there is a 300 MB/sec SATA standard, although few drives and controllers seem to support it.
If you buy the right model, you can get SATA drives that have gone through the rigorous quality control testing that has historically been reserved for SCSI drives. Many of the higher end server-grade SATA models are warrantied for 24/7 operation. SCSI has lost its advantage there.
SATA has Native Command Queueing, formerly a SCSI-only performance feature. Note that it's optional for SATA drives though, so make sure you get a controller and drives that support NCQ. Again, one of SCSI's few advantages has disappeared.
Last, but most definately not least, SATA cabling is far simpler and robust than SCSI cabling. SCSI cabling is a finicky nightmare where even high-end cables can cause data corruption if you're not careful, whereas even the cheapest SATA cables I've seen worked reliably. I've had hardware related data loss on hard drives twice in my life. One case was an IBM Deathstar, the other was a SCSI cable that started flaking out and corrupted data on three drives at once. I haven't touched SCSI with a ten foot pole since that incident.
retrorocket.o not found, launch anyway?
Look at some of the stranger RAID options. If you just use RAID5, you'll be selling yourself short. RAID3 is worth a look. I'd actually suggest you put two controllers in a machine. Run RAID0 on 4 drives on a single controller. Run RAID0 on 4 drives on the other controller. Then use Windows or Linux software RAID to run RAID1 between the two RAID0 drives. Very fast performance and fully fault tollerant.
Uhh. Yes. Then you can lose one disk in each side, and you have lost all your data.
This would perhaps be slightly less than fully fault tolerant.
Perhaps you meant to set up 4 mirror pairs, 2 on each controller, and use software to RAID0 them together.
I have successfully done this with a 24 disk 5U chassis, and it is an IO steamroller (our database server, right now).
"To err is human, to forgive is simply not my policy." --root
I've seen more dead HDD's than backup tapes, and have seen 60 times as many backup tapes than HDD's...
and last time I checked, an Ultrium 3 tape was half the price of a 400GB Drive.
I wouldn't use disks for backup, unless they're to be used as live backups, and then I'd still archive to tape (provided it was affordable).
----- Concentrate on promoting more than demoting.
The chances of losing two disks at once are slim. RAID 0+1 will provide great performance and good fault tolerance if you react to problems as they happen.
But I guess it depends on what your users need. If they need raw throughput, RAID 0+1 is better. If they need low latency, then RAID 10 may be the answer. Or maybe both systems would fall within the margin of error of each other.
In any event, once you get into what-if situations, no RAID will be good enough. What if you lose a disk? What about two? Five? Well, what if lightning hits the chasis or the janitor unplugs it to buff the floor?
The best you can do is roll the dice and play the odds. You'll see that I told him to use RAID 0+1. I also told him to use good monitoring setups to mitigate problems. I also suggested a tape backup. Actually, maybe I didn't, but I did tell him to verify his backups work and that he is able to restore from them, so that's kind of the same thing.
When it gets down to it, oppinions are like assholes; everyone has one. And most people only care about their own and don't really want to look at their coworkers'. I guess I'm the same in that respect.
I'd rather you do it wrong, than for me to have to do it at all.
SCSI still tears the alternatives to shreds for price/performance at the heavy end of the load curve, no doubt about it.
If you doubt it, try both.
For going on twenty years it's been the same: those who haven't tried SCSI claim that there's no or little difference. Those who have used both SCSI and [MFM,RLL,IDE,ATA,SATA] in high-load environments hate to try to make due with anything but SCSI.
For performance and reliability reasons both, you want SCSI if you're dealing with high-random-access-load or high-throughput situations. ATA/SATA is fine if you're just offering up noncritical bulk network storage but for the rest you want the real deal, and you will notice the obvious difference if you try both in a stressed environment.
STOP . AMERICA . NOW
Assuming equal storage sizes, SCSI drives would have way better throughput and latency than a SATA drive because you can get 15K SCSIs. However, the sizes are NOT equal. Fact is that for the price of a 147GB 15K SCSI drive, you can get about 2TB of 7200RPM SATA space.
What you end up with is the following throughput when disks are empty:
1x147GB 15K SCSI -- 150MB/s
8x250GB 7200 SATA -- 275MB/s to 550MB/s depending on exact RAID configuration
Now fill up both configurations with 140GB of data and the throughput of the 15K SCSI has dropped in half to 75MB/s because the heads are now positioned at the "slower" inner portion of the disk. Meanwhile, the 2TB SATA config is 7%-15% slower depending on the RAID config.
Latency also benefits from many disks for the same reason. Fill up a disk and you possibly have to traverse the entire disk. So while a 15K drive has a seek time of 2-3 times faster, you end up having to move 10X-15X farther than in a mega array where the heads pretty much just hover over the 2X faster outer portion.
The big advantage for SCSI is the better TCQ algorithms for multi-user access. This can be mostly negated if you use a SATA RAID controller with enough onboard RAM to reorder IO at the controller level versus depending on the drive's NCQ.
This is the route we've taken -- we went from a LSI MegaRAID 320-1 + 4-drive SCSI RAID config to an Areca 1170 + 1GB RAM + 24-drive SATA RAID. Every aspect of performance is up by big amounts -- throughput, latency, multi-user access. The drive array is actually TOO fast for our 2x244 Opteron server to drive. We ended breaking the array into 3 8-drive volumes and mirroring 2 volumes against each other for more redundancy. One of these days, we'll upgrade to faster CPUs and retest a 16-drive volume.