Slashdot Mirror


SCSI vs. SATA In a File Server?

turboflux asks: "I'm currently in the process of replacing an aging file server with something more robust. Company-wide, there will be about 100 people who could be using this server, but I don't imagine there being more than 50 concurrent users. Right now, I'm torn between spending alot on SCSI hardware, much like our other servers, or spending less, but getting more space, with SATA II drives. Whatever I decide, the server will be setup with a RAID 1+0 array for the numerous benefits it offers. Does Slashdot have opinions or suggestions on performance, reliability, and stability?"

18 of 303 comments (clear)

  1. SATA is fine by Bombcar · · Score: 5, Insightful

    SATA is fast and cheap; just make sure you spend a little bit more to get the "nearline" storage drives and not just desktop drives. Put them behind a 3ware 9550 and you'll fly.

    1. Re:SATA is fine by innosent · · Score: 2, Insightful

      Slim might be correct, but it certainly CAN happen. What is important is that you realize that it is possible. Just because you have to lose 2 or more to lose data doesn't make it safe. Having 20 drives from the same lot number could mean that all 20 are affected by a manufacturing flaw that kills them at 20,000 hours. Also, there are power surges, fire, etc. Always back up (off-site), always keep spares both on and off-site, and buy drives from multiple lot numbers. Keep in mind that the MTBF for multi-drive systems is lower than a single drive (in fact, statistically it is the MTBF for a single drive divided by the number of drives). You should expect failures, and should expect more than you would a single drive. The idea is not to reduce the likelyhood of a failure (it does the opposite), but to reduce the damage caused by a failure. Keep in mind that failures can occur in the drive, controller, power supply, or any other component. Keep spares on hand, and use a high end (3ware 9500 series works excellent for SATA, Adaptec for SCSI, but not as good IMHO as 3ware for SATA) controller card.

      As for the original topic, I would go with SATA II drives, on a 3ware controller, and make sure both the drives and the controller support NCQ. Faster drives are of course better, but more drives is also better (more spindles = more platters = more read heads = faster aggregate reads), and keep in mind that RAID level is extremely important, and the right choice has a lot more to do with your application and supporting hardware than anything else. Generally, though, RAID 5 is sufficient for storage, RAID 1 for smaller application servers that you can't afford to have outages on, and RAID 10 (1+0) on databases (or really anywhere you can afford it). RAID 10 on a high-end controller allows you to read at double (or better) the speed of the RAID 0 subarrays, since the controller can read from the stripe set that is closest to the data/least busy. Also, be sure to get a controller with plenty of memory onboard, and battery backed memory if power loss could be an issue. Personally, I'd go with the 3ware 9550SX line on an Opteron-based system (eliminate the Intel-based northbridge I/O / Mem bottleneck).

      --
      --That's the point of being root, you can do anything you want, even if it's stupid.
    2. Re:SATA is fine by ocbwilg · · Score: 4, Insightful

      The chances of losing two disks at once are slim. RAID 0+1 will provide great performance and good fault tolerance if you react to problems as they happen. But I guess it depends on what your users need. If they need raw throughput, RAID 0+1 is better. If they need low latency, then RAID 10 may be the answer. Or maybe both systems would fall within the margin of error of each other.

      He has up to 100 users and says that there will probably only be 50 or so concurrent users. Reasonable performance for such a system doesn't require lots of crazy tweaking. Implement RAID5 with a hot spare and be done with it. If you have a drive failure it automatically rebuilds and you're safe. If you have another drive failure after that before replacing the dead drive, you're still running. If you are concerned about drive performance, then spread the array across as many spindles as possible. If you have any sense you will already have a decent monitoring system in place and will know the drives have failed.

      I find myself saying this often on Slashdot, but for the average IT department it makes far more sense to buy a business line server that comes with proper support for everything that you need than to try and cobble it together yourself out of parts, and then try to keep enough spare parts around in case of failure, and try to get warranty service from 5 different parts suppliers with different warranty lengths. I mean really, who does that kind of thing?

      Go to HP, buy a Proliant server that fits your needs and price range, and use the included management software to set up email alerts when there is a hardware problem (like a drive failure or imminent drive failure). HP has the replacement part at your doorstep next day (unless you buy a warranty with faster turnaround, and next-day is still faster than you'll get from most part suppliers), and you don't have anything to worry about. I'm sure IBM and Dell can do something similar too.

      Back in the day it actually used to be cheaper to build your own computer. Not only would you save money, but you get to choose exactly the components you wanted. Nowdays the computer market has been so commoditized that it's actually much more expensive to build your own. You don't get any of the advantages of economies of scale, and the profit margins are so slim on retail models that the savings of eliminating it is negligible. And of course, now you can have your system custom built to your specs anyway. The only reason to build your own is if you want to be able to tweak and upgrade it piecemeal, like the "enthusiast" market does. That's what I do with my home PC, but I would never consider doing that with business PCs, especially a server. A server should be deployed, and after that it should pretty much sit there with zero hardware maintenance (except in the case of hardware failure).

  2. I'd say SCSI by SocialEngineer · · Score: 1, Insightful

    It's more reliable, as far as I know, compared to SATA. SATA is good enough for desktop performance, but I have yet to hear any glowing reviews of it in the server market.

    --
    "Better to be vulgar than non-existent" -Bev Henson
  3. SCSI for tier 1, SATA for tier 2... for now by Anonymous Coward · · Score: 2, Insightful

    For any tier 2 storage, SATA is the future. I'd go so far as to say that in about 80% of all instances, good SATA-II drives with NCQ and large caches in a proper RAID setup will be far, far more than adequate for most production servers unless you do tons of non-sequential I/O, need tons of iops, etc.

    For a file server, you'll be fine with SATA.

    For my tier 2 servers, I am moving a ton of stuff off of my EMC gear (because fibre channel drives are damned expensive) onto SATA-II drives in an iSCSI setup. I'm already running servers with trunked gigabit NICs... might as well let Win/*NIX boot from a local drive and mount block-level iSCSI devices over the gigabit fabric. Save a shit ton of money and get tons of space on the cheap (7.5 terabytes of fast SATA-II in an iSCSI chassis for well under $10K... make your RAID groups, carve up some LUNs, and par-tay!).

  4. Re:SCSI by humphrm · · Score: 4, Insightful

    Dude, that thinking is the difference between an SA and an Engineer. Thinking like that would have us all running MFM drives, and these newfangled "SCSI" disks would be too risky random equipment to test out on a server.

    --
    -- "In order to have power, I must be taken seriously." -Mojo Jojo
  5. Re:What's this SCSI you speak of? by fodi · · Score: 2, Insightful

    Why? Why? Why?

    At least give us a 2 line explanation, so I don't think you're speaking crap that you read in an advertisement! Please, if you have some justification, I'd love to know what it is... seriously.

  6. Re:What's this SCSI you speak of? by Deliveranc3 · · Score: 1, Insightful

    Parent is right, for an admin the scariest thing you'll ever here is.

    "I can't find the network drive"...

    SCSI will seem a bit more expensive at first but that cost isn't just for the interface most of that cost is for the extra testing and hardware reliability you get with SCSI.

    I am a pir8 and I back up everything important so I can run an SATA raid-0.

    But you want something with modular controllers hotswappable raid arrays and reliability, hell if I was running a businnes off my home PC that would be something I would invest in.

    Find a way to make do with less space on the netword and greater reliability, you will sleep A LOT easier.

  7. As with all things, it depends on the usage... by thesandbender · · Score: 2, Insightful

    I am a huge fan of efficent, cheap systems. The bulk of our server load is handled by dual opteron machines with 3ware cards and a 10k rpm system spindles and 7.2k rpm data spindles. However, even the best sata drives choke under file system and database loads and our primary data stores are U320. StorageReview.com has a good review of the new 150gig 10k rpm WD drive that shows it gettting spanked by SCSI drives under non-linear server loads. Long story short, if you expect a lot of drive activity you might be able to eek by for a while with a well tuned SATA system but you will have to pony up for a SCSI system at some point and you might as well do it now and save yourself the hassle of migrating later.

  8. REALLY depends on the task at hand by prantik · · Score: 2, Insightful

    For some things you NEED SCSI, for others you don't. That much is obvious.

    Large files/streams that require heavily mixed-mode I/O beat the balls off of SATA. E.g. Correct me if I'm wrong, but my partial understanding of SATA is that if many writes are cached and a read enters the queue, the cached writes are trashed.

    so if you are working with check-in/check-out I/O type such as Samba profiles, SVN stuff, or (Samba|N)FS on a small-medium number of small-medium size files, or web stuff, SATA offers best price/performance ratio, with RAID or whatnot.

    If you are working with large files that get a lot of unpredictable I/O, or databases, you really want SCSI.

  9. Re:Only a rookie would suggest RAID 0+1 by georgewilliamherbert · · Score: 2, Insightful
    The fact that you mentioned RAID 0+1 shows that you really don't know anything about true storage. That is probably the most inefficient way to go, congratulations!
    0+1 is a mistake, but 1+0 isn't. 0+1 loses the data set if any one arbitrary drive fails in both sides of the mirror; 1+0 only if you lose both disks in a single mirror pair.

    RAID 5 is noticably slower disk performance for writes, and radically slower performance for reads and writes if you lose a disk. In many cases, the performance during a failure for RAID 5 can reduce system capacity below the required level, and thus RAID 5 is simply not an acceptable technology for those environments.

    If just sticking more data on disk is the requirement, and you don't care how slow it gets if you have a drive fall over, then RAID 5 is great. But real world enterprise environments exist where losing half your disk throughput will cause the company's service to go down, and then you're out of business. Those guys don't RAID 5 if they know what's good for them.

  10. The very definition of RAID... by dbarclay10 · · Score: 5, Insightful

    The very definition of RAID is "Redundant Array of INEXPENSIVE Disks". Emphasis mine.

    I've already read a bunch of posts about how SCSI is more reliable than SATA. Well, they actually mean SCSI drives are generally more reliable than SATA drivers (and some actually say so). They're quite correct for the most part.

    Here's what storage vendors don't want you to know: It doesn't matter.

    Use RAID. With SCSI or FC disks, you'll have to use RAID5. At that point, two disk failures in a given array and you're screwed. You REALLY care that two disks don't fail at the same time. And when you're using low-end or even mid-range drives, it happens.

    Why do you have to use RAID5? Because with SCSI or FC disks, RAID5 is the only economical option. With a 300GB SCSI drive going for at least $1200USD, and FC drives of that size going for $2500USD, even the biggest corporations end up using RAID5.

    Of course, RAID5 isn't the only level of RAID. It's the least redundant of any level of RAID, as a matter of fact.

    Go SATA with RAID10, at least 4 drives, ideally six or more. With six drives, the likelyhood of having two drives fail before you can replace the first one is somewhat higher than if you're using SCSI, but the likelyhood of that second drive causing you data loss due to a failed array is infinitesimally smaller. It's guaranteed with RAID5, and the chance for RAID10 is inversely proportional to the number of disks in the array. So first the first drive has to fail, then the second drive which fails has to be of the same RAID1 set. Add onto that that drives do indeed "go old", and the heavier you work them, the faster they get old. With RAID5, disks tend to get worked a lot harder (without any cache, or if the cache misses, each write requires n-2 reads, and 2 writes).

    Of course, you've pretty much decided that RAID10 is the way to go. At that point it's cost. If you're looking for 50GB of fast redundant storage, SCSI is going to be slightly cheaper. If you need any amount of storage though, SATA is going to be a whole lot cheaper for the same level of reliability (which requires more spindles), and typically better speed (more spindles means more seeks per second and more megs per second, though one needs to be mindful that big SATA disks are only 7200RPM, while the slowest SCSI disks you're going to get are 10kRPM).

    Summary? I'm value-concious. I'd go the SATA route. RAID10, four disks minimum to start, a pair of 4-port 3ware SATA cards with 128MB+ of battery-backed cache. I'd do the RAID entirely with software (Linux MD), with each RAID1 set split across two controllers. We get cheap disk redundancy, cheap disk speed, cheap I/Os, and cheap controller redundancy. I'd consider using less fancy controllers, the 3ware jobbies tend to be expensive, but when you're doing big writes the cache makes a massive difference (75MB/s across four disks of RAID10 versus 20MB/s). I've considered putting together a dedicated storage appliance, exporting via SMB/NFS/NBD/GFS/what-have-you, without the battery-backed cache, but with a pair of 1U UPS units (one for each power supply). Then I'd go around turning off all the application-level fsync()ing, and see what happens with 4GB of disk cache. Bet it'd be fast. And with shutdown initiated via UPS trigger, almost as safe as a battery-backed cache. Remember: "Redundant Array of INEXPENSIVE Disks."

    God I ramble.

    --

    Barclay family motto:
    Aut agere aut mori.
    (Either action or death.)
  11. Re:BACKUP! by eric76 · · Score: 2, Insightful

    You are absolutely correct. In addition, with a tape, you can much more easily take copies off-site for storage. I frequently suggest that people get a safety deposit box in a bank at least 20 miles away from their facilities and store a copy of their backups there.

  12. Re:BACKUP! by Spazmania · · Score: 4, Insightful

    But if you may need the data 5 years or more from now, tape is clearly far superior.

    You have much luck getting data back from a tape five years later?

    First you have to find the tape. You can't have misplaced it and you can't have reused it due to the damn high cost of magnetic tape.

    Then you have to find a drive that can read the tape. The one you wrote it with died two years ago, its no longer manufactured and oh darn none of the three you picked up off ebay use the same compression format.

    Next you need the old backup software. You've been using Acme Archiver for the past three years; It doesn't understand the old SuperBackup format and unfortunately SuperBackup only ran in DOS with an 8-bit ISA SCSI card.

    Finally you have to pray that the tape is still good. They're like floppy disks; they go bad just sitting on the shelf.

    Buddy, I've been there. It ain't pretty. So for the last 7 years I've stored my backups on hard disks. No pain! No pain!

    --
    Moderating "-1, Disagree" is simple censorship. Have the guts to post your opinion.
  13. use SCSI... by Malor · · Score: 5, Insightful

    50 concurrent users is a LOT. You may not really mean concurrent, as in "50 people actually reading from or writing to this drive at the same time". If you DO mean that, you desperately need SCSI, the fastest you can find. You'll need seek time more than anything else; the drives need to respond as fast as possible to multiplexed requests for data. Rotation speed, which improves seek time and transfer rate, is good too, but it's seek time that's most crucial in heavy multitasking environments. If by 'concurrent' you mean '50 people occasionally hitting the disk', then yeah, you could probably do SATA.

    However, you already have SCSI. Management is used to paying for SCSI machines. If you have 50-100 people depending on something, and it's slow, that's a productivity drag. If you assume that all those people cost $100k/year each (not at all unreasonable with benefits), 50 people are getting paid about 2,500 bucks an hour, or about 20,000 dollars a day. In other words, if you speed them up by just 5% with better hardware, you're saving the company a thousand dollars a day. Even if it's a tiny 1% speed gain, that's still 200 bucks a day. Saving six grand a month for an upfront investment of ten grand is a total no brainer.

    Buy SCSI.

  14. Take whatever's cheapest. Buy two. by defile · · Score: 2, Insightful

    If my limited experience has taught me anything about computer reliability, it's that a single mis-set bit somewhere can bring down a system. Maybe the bit got there by user error, maybe it got there because of RAM or disk failure, maybe it got there from a bug in the application, OS, or firmware. Maybe a component on the motherboard shorted out. Maybe it's the climate. Maybe it's the phase of the moon.

    I've seen it happen with discount ghetto hardware, I've seen it happen with high end hardware. I've seen it happen on Windows. On Linux. On FreeBSD. On Solaris. I've seen servers go down due to catastrophic hardware failure and I've seen them go down because a $2 fan died. I've seen people come inches from major power supply caused injury working on a desktop PC.

    Everything will break.

    There's just too much freaking complexity. Now I just buy whatever's cheapest so I can buy way more than I need. Mix up the configurations a bit so you get some bio-diversity; if one drive manufacturer has a bad year, you don't want all of your eggs invested in them.

    Most important of all, at the first sign of trouble, throw it away.

    Try to resist the urge to fix it. I mean it. You cost more than that piece of junk. Put in a purchase request and move on.

  15. Re:BACKUP! by eric76 · · Score: 2, Insightful
    You have much luck getting data back from a tape five years later?

    Yes.

    First you have to find the tape. You can't have misplaced it and you can't have reused it due to the damn high cost of magnetic tape.

    That is no problem at all. I keep detailed listing of what backup set is stored on what backup media.

    As far as the "damn high cost of magnetic tape", you must be talking about those cheap tape drives that use expensive tapes. We have a couple of those around here, but we don't use them much at all.

    Interestingly enough, individual tapes don't really vary all that much whether they carry 4 GB or 400 GB. But IMHO the tape drives that use 4 GB tapes are not trustworthy enough to use for backups and I would never suggest using such.

    Then you have to find a drive that can read the tape. The one you wrote it with died two years ago, its no longer manufactured and oh darn none of the three you picked up off ebay use the same compression format.

    If it is no longer manufacturered in two years, then you made a very poor choice of choosing a backup system.

    Go to LTO Ultrium. It will still be around in 5 years.

    Next you need the old backup software. You've been using Acme Archiver for the past three years; It doesn't understand the old SuperBackup format and unfortunately SuperBackup only ran in DOS with an 8-bit ISA SCSI card.

    Why would anyone use something like that for a backup? The first critera should be that whatever you write to the tape should be readable using any device that can read and write to that tape.

    Finally you have to pray that the tape is still good. They're like floppy disks; they go bad just sitting on the shelf.

    I've had very little problem with bad tapes. But just in case, having just one copy of the file on a backup tape is an amateur error.

    Buddy, I've been there. It ain't pretty. So for the last 7 years I've stored my backups on hard disks. No pain! No pain!

    I've used tapes, disks, CDs, DVDs, and even in a few cases printouts of really important data. And tapes are still my favorite.

  16. Do RAID 5 ! by this+great+guy · · Score: 5, Insightful
    Whatever I decide, the server will be setup with a RAID 1+0 array for the numerous benefits it offers.

    No, choose RAID 5 instead of RAID 1+0. Here is why:

    • RAID 5 offers more usable disk space. With N disks of X GB, RAID 5 gives you (N-1)*X GB while RAID 1+0 only gives you (N/2)*X GB.
    • The maximum theoretical I/O throughput is better with RAID 5 than with RAID 1+0. With N=4 it is 1.5 times better, and when N is large (>= 8) it tends to be twice better.
    • RAID 5 is more customizable than RAID 1+0, giving you more control on the usable space / total space ratio. For example with N=10 you can choose to create 1, 2 or 3 RAID 5 arrays while with RAID 1+0 you only have 1 choice (1 large array, creating multiple smaller arrays is equivalent to a large one).
    • Linux's RAID 5 implementation rocks and consumes MUCH less CPU than what people think especially with today's 2+ GHz processors. Kernel hackers have found their implementation to be WAY MUCH FASTER than most expensive RAID 5 hardware cards.

    To give you a datapoint, I have set up multiple Linux software RAID 5 arrays on various servers with 10+ SATA disks, and the I/O throughput is over 500+ MB/s (enough to saturate 2 full-duplex GigE links !). At my previous work we had about 200 servers, all using Linux software RAID 5. And we have been MUCH MORE HAPPY than the previous setup where all of them were using hardware RAID 5. Moreover, Linux's software RAID 5 is more flexible (create arrays on ANY disk on ANY SCSI/SATA card in the system), more consistant (one and only one control software to learn: mdadm(8), no need to use crappy vendor tools or reboot into vendor BIOSes), cheaper (no hardware to buy), more reliable (no hardware card = 1 less hw component that can fail), easier to troubleshoot (plug the disks on ANY linux server and it works, no reliance on any particular hw card) and more scalable (spread the load across multiple disk controllers, multiple PCI-X/PCIe busses, or even multiple SAN devices).

    It's amazing the amount of misinformation and misconceptions about RAID that is spread around the world. I hate to say it but 95% of IT engineers don't make good choices regarding RAID servers because of all those misconceptions.