Slashdot Mirror


SCSI vs. SATA In a File Server?

turboflux asks: "I'm currently in the process of replacing an aging file server with something more robust. Company-wide, there will be about 100 people who could be using this server, but I don't imagine there being more than 50 concurrent users. Right now, I'm torn between spending alot on SCSI hardware, much like our other servers, or spending less, but getting more space, with SATA II drives. Whatever I decide, the server will be setup with a RAID 1+0 array for the numerous benefits it offers. Does Slashdot have opinions or suggestions on performance, reliability, and stability?"

13 of 303 comments (clear)

  1. Have you considered...? by Anonymous Coward · · Score: 5, Funny

    Have you ever thought about the benefits of RLL?

  2. SATA is fine by Bombcar · · Score: 5, Insightful

    SATA is fast and cheap; just make sure you spend a little bit more to get the "nearline" storage drives and not just desktop drives. Put them behind a 3ware 9550 and you'll fly.

  3. SATA? I don't know.... by toofast · · Score: 5, Informative

    I use SATA on our smaller, non-mission-critical servers. For our data backend, I wouldn't touch it with a 10-foot pole.

    Here are some scenarios where I wouldn't hesitate to use SATA:

    - You have redundant servers. Using LVS and/or Heartbeat and your favorite tools, you can get full server redundancy using less expensive hardware. The overall solution can be quite elegant, with hot failover. Why just cover the drives?

    - Front-end cluster nodes. You have a powerful, expensive backend server (with a cheaper failover) and you use inexpensive front-end servers for serving client requests. Sounds like overkill for what you want, but with the right server load balancing technology, it can give you a scalable, fault-tolerant and damn fast solution.

    - You can live with downtime. Install a server with a couple of SATA disks in a RAID configuration and hope for the best.

  4. The real info by sabreofsd · · Score: 5, Informative

    There might be some benefits to you to sticking to SCSI vs. SATA, it really depends on your preference. Both SCSI and SATA offload the main processor from the duties associated with reads and writes. SATA also now has optimized reading patterns just like SCSI. The only real advatages SCSI has right now are the speeds (SATA 150 (there is a newer faster one coming) vs SCSI 320). Also, most SCSI drives are desgined for 24/7 use, whereas most SATA drives are designed for desktop use. Just make sure the SATA drives you buy are made for Enterprise level operation. So it really comes down to compatability/speed vs. cheap/larger. Hope this helps!

    --
    Sabre
  5. SATA II is not your father's SATA by xusr · · Score: 5, Informative

    the SATA II spec is quite a bit different from the original SATA. SATA II adds port multiplication, hot plugging, native command queuing, external enclosures, and port selection. Also, with a theoretical peak of 3Gbps, it's twice as fast as the old SATA. here is a decent article with more explanation.

  6. Re:SATA is fine ... for some things by abcess · · Score: 5, Informative

    As a matter of fact, you may not be flying at all. It all depends what you're using it for. The problem with SATA is latency, and there's not much that controller is going to do about it. If you've got a server that is performing latency sensitive tasks, then SATA can cause performance problems.

    In my experience, if you've got alot of random I/O, SATA is not a viable solution. That said, even if your I/O is mostly random, if there's not a heavy load on the disk, then you're probably ok. If you've got 200 people hitting a database or email server, you're probably going to have some performance problems. Swap it out with SCSI drives, or a quality disk array, and you'll be doing much better. If you've got a web server, or a database server that is exclusively reading, you can probably get away with SATA. Again, it all depends on how much and how random the disk I/O for your application is.

  7. Re:I'd say SCSI by GigsVT · · Score: 5, Interesting

    OK, hear mine then.

    We have several terabytes of SATA storage at work to hold our main business-critical digital asset archive.

    We've been using a ATA/SATA disk-only strategy for over 5 years now. It's worked great, and eliminated our slow and unreliable tape robot, which has greatly improved productivity.

    Back in 1999/2000 SCSI wasn't an option for the main archive because a terabyte of SCSI would have broken the bank. We went ATA back then. It was a mess trying to route 24 ATA cables in a case, I admit. SATA fixes that nicely.

    We keep three copies of our data, two onsite and one offsite. We use rsync-incremental snapshots to do disk-based incremental backups. Because the cost of SATA is less than 1/3rd the cost of SCSI, we get a high reliability solution for less than the price of a single SCSI RAID.

    One more advantage of SATA is that the disks are so cheap, it's easy to just replace all of them every two or three years. The disks you replace them with generally are twice as large after 2 or 3 years, so every cycle your RAIDs get more reliable as the number of disks is slashed in half.

    Most companies wouldn't replace every SCSI disk every two years, it would cost way too much. And considering the slow pace of SCSI size growth, you wouldn't see as much gain, a double hit against SCSI.

    So basically unless you need the excellent latency performance of SCSI, higher than even the WD Raptor can offer, I see no compelling reason to use SCSI for anything anymore.

    --
    I've had enough abrasive sigs. Kittens are cute and fuzzy.
  8. SATA by Andy+Dodd · · Score: 5, Informative

    SATA's peak raw transfer rate (150 MB/sec) is half that of the peak raw transfer rate of SCSI (320 MB/sec), but you're going to be limited by the individual hard drive's transfer rate anyway. Keep in mind that a proper SATA implementation will be 150MB/sec PER DRIVE, since each drive is on its own channel. SCSI is 320 MB/sec per channel, but you're in for a cabling nightmare if you want only one drive per channel. Note that there is a 300 MB/sec SATA standard, although few drives and controllers seem to support it.

    If you buy the right model, you can get SATA drives that have gone through the rigorous quality control testing that has historically been reserved for SCSI drives. Many of the higher end server-grade SATA models are warrantied for 24/7 operation. SCSI has lost its advantage there.

    SATA has Native Command Queueing, formerly a SCSI-only performance feature. Note that it's optional for SATA drives though, so make sure you get a controller and drives that support NCQ. Again, one of SCSI's few advantages has disappeared.

    Last, but most definately not least, SATA cabling is far simpler and robust than SCSI cabling. SCSI cabling is a finicky nightmare where even high-end cables can cause data corruption if you're not careful, whereas even the cheapest SATA cables I've seen worked reliably. I've had hardware related data loss on hard drives twice in my life. One case was an IBM Deathstar, the other was a SCSI cable that started flaking out and corrupted data on three drives at once. I haven't touched SCSI with a ten foot pole since that incident.

    --
    retrorocket.o not found, launch anyway?
  9. The very definition of RAID... by dbarclay10 · · Score: 5, Insightful

    The very definition of RAID is "Redundant Array of INEXPENSIVE Disks". Emphasis mine.

    I've already read a bunch of posts about how SCSI is more reliable than SATA. Well, they actually mean SCSI drives are generally more reliable than SATA drivers (and some actually say so). They're quite correct for the most part.

    Here's what storage vendors don't want you to know: It doesn't matter.

    Use RAID. With SCSI or FC disks, you'll have to use RAID5. At that point, two disk failures in a given array and you're screwed. You REALLY care that two disks don't fail at the same time. And when you're using low-end or even mid-range drives, it happens.

    Why do you have to use RAID5? Because with SCSI or FC disks, RAID5 is the only economical option. With a 300GB SCSI drive going for at least $1200USD, and FC drives of that size going for $2500USD, even the biggest corporations end up using RAID5.

    Of course, RAID5 isn't the only level of RAID. It's the least redundant of any level of RAID, as a matter of fact.

    Go SATA with RAID10, at least 4 drives, ideally six or more. With six drives, the likelyhood of having two drives fail before you can replace the first one is somewhat higher than if you're using SCSI, but the likelyhood of that second drive causing you data loss due to a failed array is infinitesimally smaller. It's guaranteed with RAID5, and the chance for RAID10 is inversely proportional to the number of disks in the array. So first the first drive has to fail, then the second drive which fails has to be of the same RAID1 set. Add onto that that drives do indeed "go old", and the heavier you work them, the faster they get old. With RAID5, disks tend to get worked a lot harder (without any cache, or if the cache misses, each write requires n-2 reads, and 2 writes).

    Of course, you've pretty much decided that RAID10 is the way to go. At that point it's cost. If you're looking for 50GB of fast redundant storage, SCSI is going to be slightly cheaper. If you need any amount of storage though, SATA is going to be a whole lot cheaper for the same level of reliability (which requires more spindles), and typically better speed (more spindles means more seeks per second and more megs per second, though one needs to be mindful that big SATA disks are only 7200RPM, while the slowest SCSI disks you're going to get are 10kRPM).

    Summary? I'm value-concious. I'd go the SATA route. RAID10, four disks minimum to start, a pair of 4-port 3ware SATA cards with 128MB+ of battery-backed cache. I'd do the RAID entirely with software (Linux MD), with each RAID1 set split across two controllers. We get cheap disk redundancy, cheap disk speed, cheap I/Os, and cheap controller redundancy. I'd consider using less fancy controllers, the 3ware jobbies tend to be expensive, but when you're doing big writes the cache makes a massive difference (75MB/s across four disks of RAID10 versus 20MB/s). I've considered putting together a dedicated storage appliance, exporting via SMB/NFS/NBD/GFS/what-have-you, without the battery-backed cache, but with a pair of 1U UPS units (one for each power supply). Then I'd go around turning off all the application-level fsync()ing, and see what happens with 4GB of disk cache. Bet it'd be fast. And with shutdown initiated via UPS trigger, almost as safe as a battery-backed cache. Remember: "Redundant Array of INEXPENSIVE Disks."

    God I ramble.

    --

    Barclay family motto:
    Aut agere aut mori.
    (Either action or death.)
  10. Re:BACKUP! by kahanamoku · · Score: 5, Informative

    I've seen more dead HDD's than backup tapes, and have seen 60 times as many backup tapes than HDD's...

    and last time I checked, an Ultrium 3 tape was half the price of a 400GB Drive.

    I wouldn't use disks for backup, unless they're to be used as live backups, and then I'd still archive to tape (provided it was affordable).

    --
    ----- Concentrate on promoting more than demoting.
  11. use SCSI... by Malor · · Score: 5, Insightful

    50 concurrent users is a LOT. You may not really mean concurrent, as in "50 people actually reading from or writing to this drive at the same time". If you DO mean that, you desperately need SCSI, the fastest you can find. You'll need seek time more than anything else; the drives need to respond as fast as possible to multiplexed requests for data. Rotation speed, which improves seek time and transfer rate, is good too, but it's seek time that's most crucial in heavy multitasking environments. If by 'concurrent' you mean '50 people occasionally hitting the disk', then yeah, you could probably do SATA.

    However, you already have SCSI. Management is used to paying for SCSI machines. If you have 50-100 people depending on something, and it's slow, that's a productivity drag. If you assume that all those people cost $100k/year each (not at all unreasonable with benefits), 50 people are getting paid about 2,500 bucks an hour, or about 20,000 dollars a day. In other words, if you speed them up by just 5% with better hardware, you're saving the company a thousand dollars a day. Even if it's a tiny 1% speed gain, that's still 200 bucks a day. Saving six grand a month for an upfront investment of ten grand is a total no brainer.

    Buy SCSI.

  12. Do RAID 5 ! by this+great+guy · · Score: 5, Insightful
    Whatever I decide, the server will be setup with a RAID 1+0 array for the numerous benefits it offers.

    No, choose RAID 5 instead of RAID 1+0. Here is why:

    • RAID 5 offers more usable disk space. With N disks of X GB, RAID 5 gives you (N-1)*X GB while RAID 1+0 only gives you (N/2)*X GB.
    • The maximum theoretical I/O throughput is better with RAID 5 than with RAID 1+0. With N=4 it is 1.5 times better, and when N is large (>= 8) it tends to be twice better.
    • RAID 5 is more customizable than RAID 1+0, giving you more control on the usable space / total space ratio. For example with N=10 you can choose to create 1, 2 or 3 RAID 5 arrays while with RAID 1+0 you only have 1 choice (1 large array, creating multiple smaller arrays is equivalent to a large one).
    • Linux's RAID 5 implementation rocks and consumes MUCH less CPU than what people think especially with today's 2+ GHz processors. Kernel hackers have found their implementation to be WAY MUCH FASTER than most expensive RAID 5 hardware cards.

    To give you a datapoint, I have set up multiple Linux software RAID 5 arrays on various servers with 10+ SATA disks, and the I/O throughput is over 500+ MB/s (enough to saturate 2 full-duplex GigE links !). At my previous work we had about 200 servers, all using Linux software RAID 5. And we have been MUCH MORE HAPPY than the previous setup where all of them were using hardware RAID 5. Moreover, Linux's software RAID 5 is more flexible (create arrays on ANY disk on ANY SCSI/SATA card in the system), more consistant (one and only one control software to learn: mdadm(8), no need to use crappy vendor tools or reboot into vendor BIOSes), cheaper (no hardware to buy), more reliable (no hardware card = 1 less hw component that can fail), easier to troubleshoot (plug the disks on ANY linux server and it works, no reliance on any particular hw card) and more scalable (spread the load across multiple disk controllers, multiple PCI-X/PCIe busses, or even multiple SAN devices).

    It's amazing the amount of misinformation and misconceptions about RAID that is spread around the world. I hate to say it but 95% of IT engineers don't make good choices regarding RAID servers because of all those misconceptions.

  13. Re:SATA is fine ... for some things by Anonymous Coward · · Score: 5, Informative

    Assuming equal storage sizes, SCSI drives would have way better throughput and latency than a SATA drive because you can get 15K SCSIs. However, the sizes are NOT equal. Fact is that for the price of a 147GB 15K SCSI drive, you can get about 2TB of 7200RPM SATA space.

    What you end up with is the following throughput when disks are empty:

        1x147GB 15K SCSI -- 150MB/s
        8x250GB 7200 SATA -- 275MB/s to 550MB/s depending on exact RAID configuration

    Now fill up both configurations with 140GB of data and the throughput of the 15K SCSI has dropped in half to 75MB/s because the heads are now positioned at the "slower" inner portion of the disk. Meanwhile, the 2TB SATA config is 7%-15% slower depending on the RAID config.

    Latency also benefits from many disks for the same reason. Fill up a disk and you possibly have to traverse the entire disk. So while a 15K drive has a seek time of 2-3 times faster, you end up having to move 10X-15X farther than in a mega array where the heads pretty much just hover over the 2X faster outer portion.

    The big advantage for SCSI is the better TCQ algorithms for multi-user access. This can be mostly negated if you use a SATA RAID controller with enough onboard RAM to reorder IO at the controller level versus depending on the drive's NCQ.

    This is the route we've taken -- we went from a LSI MegaRAID 320-1 + 4-drive SCSI RAID config to an Areca 1170 + 1GB RAM + 24-drive SATA RAID. Every aspect of performance is up by big amounts -- throughput, latency, multi-user access. The drive array is actually TOO fast for our 2x244 Opteron server to drive. We ended breaking the array into 3 8-drive volumes and mirroring 2 volumes against each other for more redundancy. One of these days, we'll upgrade to faster CPUs and retest a 16-drive volume.