Pros & Cons of Different RAID Solutions
sp1n continues: "We are currently considering 3 options:
(1) SCSI - EIDE controller with six 9G/7200 ATA drives (hadn't heard of this one until recently). This supposedly accesses the drives directly through DMA and bypasses all IDE, just using them as physical media. All are accessed in parallel. I'm a bit weary about the reliability of IDE drives under constant use.
(2) SCSI - SCSI controller with six 9G/7200 u2w drives. The controller currently at the top of my list is the Mylex DAC960SXi w/ 32MB cache. However, something that fits in a half-height bay, instead of hogging a full-height would be nice.
(3) SCSI - SCSI controller as above, running with 2 disk channels and 2 separate RAID 5 arrays for each mountpoint (spool/mail storage).
I'm looking for any experience with IDE/DMA raid setups (1), as well as the pros/cons of making 2 partitions, both which are very active, on one array of 6 drives (2), as well as 2 separate level 5 arrays of 3 for each mountpoint (3). In addition, any suggestions for external controllers and rackmount enclosures would be greatly appreciated. I would like the controller to have an i960 or better processor.
--
"The glass is not half full, nor half empty. The glass is just too big."
AFAIK, The only difference between a scsi hard drive and an ide hard drive is one little controller chip on the drive. So the reliability of the ide drives, mechanically, should be identical to that of the scsi drives.
Jeremy
Looking for a Python IRC bot?
I had a similar problem. I went with the sun StorEdge A1000. Its just a greate piece of hardware. I got 12 18Gb drives, 10,000 rpm segate cheetas. Its in 2 raid 5 clusters. With on hot spare. I needed to geta differential scsi adapter.. as they dont come standard on ultra2's. Wow is it fast. I can move GB in what seems like seconds. Its a night and day improvment over a jbod box. A bit pricey.. about 17K after our 50% edu discount. Its all scsi-scsi, host swap disks, host swap power supplies. When you running solaris nothing beats sun hardware.. it just works.
You might also consider just adding multiple scsi controllers and have as many drives as possible.
With each additional drive, you can access another unique piece of data simultaniously. While raid is nice and helps solve reliability and performance problems, it isn't the only solution.
It is a technique that newsgroup server admins used to use, and probably still do.
Before you go out and purchase an expensive RAID solution (of any kind), make sure this is really the problem. The vmstat command will make it quickly apparant what kind of i/o is happening, and further analysis might tell you more about what kind of hd accesses are happening.
In many cases, adding more memory or CPU can make a bigger difference than more/faster hard drives, if the problem is that the cache is too small, or paging activity too much. Also check your CPU load and make sure it is nowhere near 100% - if so, time to get a 2nd CPU.
Also, avoid software RAID implementations like the plague. They will slow down your system and provide questionable reliability. You should also try to find cards that have redundant SCSI controllers onboard, and support redundant cabling. This way if the cable, plug, or SCSI bus fails for some reason you will not be SOL.
Finally, be sure that the majority of your disk accesses are reads. RAID will slow down writes, sometimes drastically so. If the majority of your disk accesses are writes, then tuning your kernel to flush dirty buffers less often may make a good difference.
You may want to look at a Dell Powervault as a possible solution. Check out dell's website for details. They are VERY reliable and VERY fast, not to mention Dell has the best support in the industry.
The Network Appliance Filers are really sexy.
The beautiful thing is they use the WAFL filesystem so you can expand your array when you need to without adding big sets of drives.
Granted, I don't have one but I've submitted the proposals and am waiting on financing. The F720 scales to 464GB, is network attached, has journaling (rad), and can benefit your WHOLE network.
Of course, you have to use NFS or SMB though. I've heard they start as low as $17k but usually $30-40k with a bunch of drives but it's difficult to find general prices without hearing the sales pitch.
This paper discusses testing the Stanford Linear Accelerator Center performed while evaluating the NetApp filers. It's geared toward Usenet news but if it can handle that, it can surely handle your mail situation.
Does anyone here have first hand experience good or bad with NetApp Filers? And some word on the pricing?
It doesn't sound like you need a lot of space if you're currently doing well with 9GB and 7GB. Get a pair of 18GB drives for the spools and a pair of 18GB drives for storage, and you should be set.
RAID 0+1 is a lot faster than RAID 5. It's disadvantage is that it's more expensive because you have to buy 100% more disk than storage, as opposed to 20-33% more for RAID 5.
As far as which controller to use... Sun now rebrands DPT controllers, but they're pci and you're stuck on sbus, so I don't know.
Good luck
First off, it's not clear from your post how heavily loaded the drives really are.
In particular: load is a measure of how many processes are using or waiting for a resource (such as disk I/O, CPU or network I/O). On a busy mail server that's completely adequate for the job, I'd expect to often see a high load average due to the number of processes that are waiting on the network. That is, due to the number of processes waiting for slow network connections to places halfway around the world.
All you mention is the load averages and a fairly non-specific measure of drives that are "cranking away constantly". If the drives were being used at a current constant 10% of available I/O, they'd tend to "crank constantly" even if they could be hit much harder. (still, given that losing email is considered bad by customers, a RAID 5 solution seems like a good idea anyways and leaves you room to grow and handle sudden increases in email from the holidays or spammers or gradual expansion of business)
As to IDE vs. SCSI -- never go with straight IDE on a server. SCSI has the ability to lie to the OS and silently move data from sectors that have gone bad into sectors reserved for that purpose. Sure, it slows down access to that particular block of data, but it's a lot easier than the OS having to deal with failures directly. However, I'm completely unfamiliar with the strange SCSI - EIDE setup that you're describing -- if it treats them as just physical media and provided the SCSI interface itself, it may be able to do that particular SCSI trick, as well. Physically, SCSI drives and EIDE drives are identical -- as in, you can find the *exact* same drive from certain manufacturers, only one has SCSI and the other EIDE. Reliability of the physical media is the same, IOW. In a normal configuration, *apparent* physical reliability is higher for SCSI due to wonderfully useful trickery.
I don't recall the exact model numbers, but I've seen pretty good results with Mylex RAID controllers before. (more along the lines of database stuff than what you're talking about -- somewhat different needs, but not all *that* different, I suppose.)
I can't see putting two partitions on one RAID device as making a lot of sense -- since things are striped you'd end up running into contention issues.
IOW: I'd guess that option #3 would be the fastest -- it's also probably the most expensive.
If I were you, I'd check more carefully to determine how much of the currently available disk I/O is actually being used... If the budget allows it, the dual-channel RAID solution sounds pretty good. You might want to go with two single-channel RAID cards instead -- makes it easier to stock a backup card in case a card decides to die. Try and get something with hot-swappable drives, too. It makes the RAID stuff so much more useful.
Also, I don't know the details of your setup (of course), but seriously consider breaking the mail serving task into separate pieces and run it on separate machines.
You have:
1) incoming email
2) outgoing email
3) email from customers
4) email customers pick up (POP)
It sounds like you have one machine handling all of these. Breaking these tasks onto separate boxes (If you've made the mistake of telling customers the same thing for #3 and #4 (ie, mail.isp.net instead of mail.isp.net and pop.isp.net) it might be impossible to split those two tasks away from each other)
You can have a setup such as:
outgoing1 through outgoingN all behind the single name of "outgoing" that internal machines are told to send email to that they don't know how to deal with
mail1 through mailN all behind "mail" that customers are told to have as their outgoing mail server. In particular, it should blindly send off email it doesn't know how to deal with to outgoing.
pop (harder to break into separate machines, but possible)
incoming1 through incomingN with MX records pointing at them for your domain.
Now, breaking into that many machines is probably silly. Moving outgoing to one machine and everything else to a second machine (and possibly mailing lists off to a third machine) may make a *lot* of sense though. Don't get tied into the idea of a monolithic machine to accomplish everything related to a particular task -- eventually it's much more expensive than many cheaper boxes to handle the same task.
We've just spent 2 weeks at my office researching the different solutions available to us for implementing the most reliable and scalable solution available today. Our needs differ a bit from yours as we're looking to put many machines on a network for load-distribution yet they all need to speak to the same data on a single repository. This holy grail is know as a SAN, or Storage Area Network.
/. community something to chew on in return for all that I've learned here.
Our solution is going to be a single cabinet RAID (level 5 for accessing smaller files) with a "hot spare" that will rebuild a crashed disk on the fly. This being a standard cabinet we'll have 8 disks, of which the capacity of 6 will be data (one parity (term used loosely as parity is striped on RAID-5), and one spare).
The disks are Seagate's 10,000 RPM Cheetahs, the most commonly recommended units among all the vendors we've talked to, and the controller is a multi-channel u2w with fibre interface to a Q-Logic PCI adapter.
The total system is going to run just over $15,000. This sounds like a lot, but pricing lower end systems isn't too much cheaper and you'll never get 24-hour turnaround on failed parts (if they're even available). This seems like overkill for a single system, but by adding a fibre hub later we can use the single system for many many machines once a file controller (dedicated machine) is put into place.
The beauty of SAN is that it operates much like FTP, with a control and a data connection. The control connection occurs over your existing LAN, and the data is transmitted directly over the fibre channel (max rate of 100 MB/s).
Other NAS (Network Accessible Storage) models are somewhat cheaper to implement, but performance can never match the fibre as the "control" and "data" connections (NFS or SMB) both transmit across your network.
I apologize for digressing from the straight RAID topic, but I felt obligated to give the
-Steve
- A.P.
--
"One World, one Web, one Program" - Microsoft promotional ad
"Remember when the U.S. had a drug problem, and then we declared a War On Drugs, and now you can't buy drugs anymore?"
on the IDE v SCSI be careful. with some drives the difference really is just a chip, but often drive manufacturers will use different actuators and such for SCSI drives (due to the fact that they're more likely to be dropped into a high-stress environment). The MTBF for a drive that's expecting to run grandma's recipe book is not relevant when used as a high-stress server.
I'd suggest a SCSI or Fibre Channel raid array, with some 10,000RPM drives, and lots of cache on the drives and the controller. If you are currently IO-bound, you want to make sure that you remove that bottleneck for at least a couple years. Some sort of external enclosure might be nice if only due to the fact that 10,000RPM hard drives make a LOT of heat, so it keeps things a little less critical. Oh, and of course I'd recommend using RAID-5 for obvious reasons. RAID-0 is faster, but clinically insane.
Another solution is to look at Communigate mailserver from http://www.stalker.com
It allows you to cluster your mail server to multiple servers with very little fuss.
don't listen to that crap, the scsi drives are built for industrial use. You get what you pay for. Go for the scsi setup you will be glad in the end. as for the raid five config, if you dont have the fault tolerance on (parity stripe) and are just doing it to crank every bit of speed you can out of that box well go for it; but what the other guy said about checking the amount of writes your raid is doing, sounds like a well thought out solution. i know people who have professed their love for ide but when they get a taste of scsi, they rarely go back.
I spent a fair amount of time looking at RAID 5 solutions this past summer for a client. Both external and internal, for Linux. Tried several different controller card brands and drive configurations, did a lot of reading, and bugged a lot of vendors.
You really should try to test your options and all of the configuration combinations using something like Bonnie, on a machine with a simular configuration to your target server. Make sure that your Bonnie test file size is at least twice physical RAM, to eliminate the effects of RAM and controller caching on the results.
I found that using 6 drives in a RAID 5 config was a LOT faster than 5 drives, most of the time. In fact, 3 drives in an array was faster than 5 in some cases. I think it has to do with the way the controller cards were calculating the distributed parity, and perhaps also due to things the driver was doing. 4 drives usually wasn't much better than 3, either.
Stripe sizes for the array can also make a big difference. 32k vs 128k, etc. Larger strips sizes are usually better for I/O speed, but you may find for email that having a higher number of random seek transactions per second is better than raw speed.
I did not get a chance to do any hard testing of multiple channel configurations with these cards. I suspect that splitting the I/O onto multiple channels would be a win.
IMHO, you definately want a i960 based board or system, with the fastest CPU you can find on them. I noticed a signifigant difference between boards with the 33Mhz part vs. the 66Mhz part.
FYI for others: for controllers, the AMI MegaRAID (alias Dell's PERC2/SC) just blows chunks. Older non-LVD, non-raid SCSI systems can run rings around it, at least on write speed.
It has been my experience that the write speed on a RAID 5 system is generally only a fraction of the reading speed, like 1/4th to 1/2. For a quick and stupid test, do something like 'time cat /proc/kcore > /tmp/kcore' and do the math for MB/second.
oh, and my current favorite card is the DPT Millenium V controller, using it in several systems in various places for the last 3 or 4 months. Here are some Bonnie results for a system with a DPT with 6x 7200 RPM drives, all on the same channel (internal) Linux kernel 2.2.10, dual P3 500Mhz:
our setup has right about 31,000 users constantly checking and sending email and is running RH 6.1 on a dual PII/333 with 128mb ram and 9g UW SCSI. I haven't seen a load higher than 0.75 since that machine has been the mail server... maybe something about how your mail server is setup is creating a tremendous bog on it.
Perl - $Just @when->$you ${thought} s/yn/tax/ &couldn\'t %get $worse;
I would recommend a Sun MultiPack with Solstice DiskSuite for management.
Load average is defined as the number of processes sitting on the run queue. This need not indicate a disk IO bottleneck.
;)
I would be surprised if any exim system was having more of a bottleneck to disk than it was to network. Your disks are faster than your network and exim is pretty light on un-required disk access.
The more bottleneck to network (by network I mean end-to-end with your customer not just your links) is large, the more processes are going to hang around longer.
More processes, more paging, less cacheing. Less cacheing, more IO. More paging, more IO.
Probably teching granny to suck eggs - but you do have your swap space on a seperate device don't you
The more exim processes that hang around longer, the more processes for the CPU to switch around. The more switching, the more likely you are to see paging.
If the processes hang around longer, they take up more memory which reduces the cache-size available.
Exim has several files which it accesses frequently, mainly the retry databases and its configuration. These should perminantly be in memory.
Bottom Line:
I do however suggest that you don't consider moving a single server to RAID. If you have a server that you want to move to RAID for efficency purposes... your design is wrong and you should be building a scalable system .
Red
Personally speaking for a load of this magnitude SCSI is the only solution.
...
Don't even think of software RAID.
For some background on SCSI itself try http://www.scsifaq.org
There are many types of RAID 0-5 are the "standard" but there are several new ones eg level 10 which attempts to address throughput issues. Your actual space requirements don't seem outrageous so level 5 would be reasonably cost effective.
Another thing you will probably want is hot swapping. Once you've had a box tell you a drive is dead, you've removed it and popped a new one in without taking the box down, you will not want anything else.
On the IDE vs SCSI debate, whilst IDE is fast it seems to me that under continuous load SCSI gives better throughput.
As others have pointed out - a 'designed' server, rather than a "roll your own" box would make sense. Compaq Proliants make excelent Linux machines. The SMART arrays are very good and support RAID to level 5. You can fit a lot of disks in the drive cages as well. They are a little pricey but of a good quality and reliability. We have rather a lot of them running NetWare. I get to use the older kit to run my funny Open Source stuff
A suggestion might be:
Proliant 1600, 2 x 600Mhz processors, SMART 3200 with 64Mb cache, 5 drive slots - 81 Gb available after RAID 5 on 18Gb 1" drives (that's Ultra-2 SCSI) supports upto 1Gb RAM (has 128 by default). There is also an on-board SCSI interface for CDROM etc. This comes in at about GBP 9,000
I'm not familiar with Exim, but aren't there more efficient solutions?
Although my experiences have been with much smaller configurations, qmail reportedly handles loads of this magnitude on lesser hardware.
I used to run a large mail server at a fairly big ISP who will remain nameless, and I'd like to suggest you consider a RAID-10 solution, we were experiencing disk bottleneck problems, and this really helped. Basically, RAID-10 splits the disk i/o half and half over multiple drives with the standard mirroring/striping. This is a simplified explination, but that's the basic idea.
First try iostat -D -l (numberof disks+2) 5 to get percentage utilisation in 5 second intervals.
This is my favourite tool for disk analysis. Secondly go to http://www.sun.com/sun-on-net/performance read what you feel is important but download the se toolkit.
Run zoom.se to get a professional analysis of your system. Run virtual_adrian.se to get a virtual professional to tune your box.
I recommend you do this BEFORE spending any money. I have an E3000 with 2Gb RAM and 2% processor utilisation because nobody checked the system properly.
If it is your disks I recommend sun kit even though it is expensive and RAID 5. Don't worry about people telling you about it being slower, compared to a thrashing single spindle it is extremely fast and as importantly reliable. Tinker and learn!
if this is a server, don't go with IDE - you are a business looking for *safety* of the data as well as performance, and should be willing to fork over the extra 20 to 100 percent it takes for scsi...
as for controllers, i say mylex, high-end adapter of your choice, i would beef it up to 128 megs of ram in any case...
as for the drives, go 10,000 RPM, the difference in access times will help you out, and i think that is much more important in your case than trasfer rate... for an ISP, i would only ever buy IBM or Seagate drives, reputable workhorses that they are...
for great cases and setups, i honestly recommend macgurus.com - they specialize in mac stuff, but a scsi tower is a scsi tower, and they will build it with good components at a reasonable price to whatever specs you need... (no, i dont work for them)...
Basically, I Just want something to Play With (tm).
Just looking for a way to play with raid on a home system. As you put it, if it were to go down, who cares =) I'd rather make mistakes now while I can afford them.
I see you guys like the case on my page =)
-S
Scott Ruttencutter
We Apprentice Developers and Designers
Hi, /mail/a /mail/b /mail/c and all users which will begin with an a will be placed in /mail/a ... etc ... )
a couple of years ago we had the same problem till I discovered that all our mailboxes where in one mail spool directory. This was a huge bottleneck and after adapting qpopper and configuring sendmail to a split mailspool dir load came down to 1. (split mailspool is
check above first before you buy hardware
Well, after dealing with many different brands of RAID controllers, I have found that DPT's Millenium series tend to be the best. The card takes care of everything, and they're available in 64-bit flavors with 3 onboard U2 channels, or 2 Fibre channels.
Mylex are good if you're looking for a cheaper solution, or Adaptec for dirt cheap. But, if you're looking for the absolute fastest possible solution, it would be Fibre Channel Quantum Atlas 10k's on a 64-bit DPT Millenium Fibre controller in a RAID 0+1 configuration. With a 10 drive setup (equal to the total capacity of 5 of the drives) you could easily reach 100MB/s. Of course, that's gonna cost you a pretty penny.
Another non-functioning site was "uncertainty.microsoft.com."
The purpose of that site was not known.
Our mail server is currently handling about 1M messages a day. IO became a serious issue. We're still using sendmail, and I'm not going to give it up (we know it, we have a custom builds for strange applications, it works). As others have noted, load average doesn't mean much here - I have some machines with a load average at 4 that are actually idle and fine, and others at .2 that need tuning. Ignore it and concentrate on what matters.
Assuming IO matters, I am putting my full faith (and job) on Mylex controllers. I love them. I only have one in production, but am about to deploy 5 more, and we'll come in at about 600G managed by them. They just work. The DAC960SXi I have in production (for 7months now) has been flawless, delivering wire speed doing RAID 5 without any effort after initial config (which is a bit annoying, to be sure).
My production system using it is doing far too many things - mail, staging server, enterprise backup. This is changing - lack of time and historical accident made it that way. The point is that the Mylex handles it with no grief.
If you're building these, be aware that Mylex external controllers need to be mounted in a box with "internal" style connectors. For good RAID cases, check out http://www.storagepath.com/ - they are what I'm using. They look low rent, but the boxes are nice (if a bit expensive).
Down to specifics. For a mail only machine doing the sort of volume you're talking about, I'd deploy a dual processor box with three SCSI busses (one for spool, two for mbox/system access - system access is pretty cheap in comparison) attached to two harware RAID setups. Granted volume allows, I'd go RAID 5 for spool (with 18G disks, that's ~65G spool) and hot spares. For mboxes, I'd do 0+1, for as much space as needed. Stripe disks on independent controllers, mirrored to each other. Striped mirrors can grow, as you need them to (RAID 5 can't, easily). You don't want to lose anyone's mail. Hot spares for each.
Assuming 100G of mboxes, that's a total of 17 18G disks. Add three Mylex DAC9660SXis and (initially) 3 rack mount cases, and that's something around ~24K.
Availability beyond disk is a different question, that gets platform specific. I do mainly Solaris now, so I can't talk much about Linux for this. Mylex controllers can do dual active/dual host configurations, but things get more complex, and
a summary here doesn't make sense.
Other options like A1000s (Sun specific) and Netapps require different approaches - they're very different beasts. We have all of the above, and treat them very differently. We'll buy them all again - they're all decent - but are good at different things.
If you can, buy raw Mylex contollers through a reseller like TechData or similar - you'll save a lot.
Hope this helps some.
-j
I forget what 8 was for.
The first thing about a hardware raid controller is that it hides failures from the operating system. With software RAID you have to manually carry out all sorts of tasks, and I'm sure we've all heard of the engineer who mirrored the new blank disk on top of the one remaining data disk of a mirror.
Units such as SUN A1000 and Baydel connect via SCSI and you just watch for an orange light, even the part-time cleaner could pull out the correct disk and replace it and have the system back and running without the OS noticing. Storageworks and Clariion(EMC) do the same but over Fiber Channel. SCSI units tend to top out at 40Mb/s, Fiber Channel theoretically top out at 200Mb/s (they have two 100Mb/s loops) but since I only had a max of 30x18Gb disks to play with the disks were the bottleneck. Monster multi-scsi machines like EMC/IBM's can achieve whatever bandwidth you want by multiplexing SCSI connections.
We've evaluated software RAID, Hardware RAID over SCSI, Hardware RAID over Fiber channel from EMC, IBM, SUN, Compaq(storageworks) and in our opinion a good smart raid controller with two data channels and load balancing software is impossible to beat.
For Speed, stripe(0) mirrors together(1), in RAID 0+1, this allows reads at double speed because each mirrored disk can handle a request seperately, and slightly sped-up writes because you can write to the RAID controller's NV cache and carry on doing your work whilst that takes care of putting the data to media.
This of course has only a 50% data efficiency.
Using Raid 3 or 5 you lose one disk in a rank for parity, raid 6 (used by Network Appliances) use two disks for parity but have wider ranks of disks. This often means that sequential reads are fast, because a request for data wakes up all the disks in the rank, but therefore the whole rank can only handle one request at a time. Writes are slower because you have to read a stripe of data, calculate parity and write the whole stripe back again.
RAID5 is really good for data which doesn't have to be the absolute fastest.
Whilst we were doing performance tests, we measured a linear increase in speed up to 20 disks (in transactions/second), and there is a definite art in making sure that you spread the load over all the disks available so that a single disk doesn't get thrashed to death.
In conclusion? well, that depends on your OS.
For me, for a PC-based system I would choose a hardware RAID system with SCSI connection which let me choose the LUN sizes. 5 disks in a RAID5 configuration will only waste 1 disk in capacity. If you're finding your mail spool is being thrashed then I would build a 10 disk 0+1 raid and stripe the mail area across them, using the rest of the area for home areas or web areas or something else which has large storage requirements but doesn't get hit hard.
Oops, this assumes that this REALLY is your problem, a lot of disk problems go away by adding more memory to the machine... I assume you have measured this by tracking the outstanding I/O queue.
-- Don't believe everything you read, hear or think
The original posting doesn't say if the server is running pop/imap, and thus if it is used as the final delivery point for those 10,000 users.
If it is, then the hashing of the mailbox path that lucky luck mentioned is worth investigating. Also worth investigating is alternative mailbox formats. If you're using mbox format, then I'm not surprised there's a problem if you have a large number of users (and/or reasonably large mailboxes).
There has been some discussion about these issues on the exim-users mailing list. I read it via egroups.
As for me, I'm considering their lower end SCSI boxes connected to high-end Intel server running Linux, beings I have $52,000 to spend this year! (yippee). The idea is to put all the money where the valuables are (the data) and use commodity hardware and open source software to drive it. The OS would boot from internal HD and all data and local customizations (ie, /usr/local) would be on external RAID box. If a CPU box fails, unplug it from the array, plug in a spare CPU box, reboot. Minimal downtime due to hardware problems. I can then repair or replace the busted CPU box at ease.
For linux jockies, there is efforts to bring fibre-channel drivers to Linux. Be sure to look at the work at Worcester Polytech for info.
When I worked at Demon, the netapps were one of the most reliable pieces of machinery that I administered. Whilst you might think that network attached storage can be a performance problem, in practice it worked very well indeed.
:-)
You do, however, need to be aware of how to make your application play well over NFS. Exim is actually reasonable at this. Qmail is good at storing mailboxes on NFS thanks to it's Maildir technology, but the mail queue *needs* to be on a local disk... I'm not sure about postfix or sendmail (bletch).
Unfortunately, I can't remember the command to make the individual LEDs on the disks blink, which is one of the best remote diagnostic features ever.
-Dom
I accept that you will need to test to make sure that the disks are not the problem but you will need to do it the right way.
Firstly vmstat tells you very little about disk i/o. What it is good for is the processes. Look at the output from vmstat 5 for example. The first three colums are r b w, running, blocked and waiting. If there are blocked processes look at WHY processes are blocked. Use top to get the i/o wait information. If there is a lot of io wait then look at the disks. Use iostat -D to get percentage utilisation of the disks. If there is a lot of disk wait then you may need to either add more disks or spread the load.
It is interesting to note the relative speeds of devices:
If cpu takes 3 seconds to do a job then,
Level 1 cache takes 10 seconds
Level 2 cache takes 1 minute
Memory takes 10 minutes
Disk takes 7.7 months
Network takes 6.5 years
Get stuff off your disks better! Monitor your cache hit rate to get information on efficiency. Use vmstat or sar or stuff from the se toolkit. Get the se toolkit from http://www.sun.com/sun-on/net/performance. Run zoom.se to monitor your system. Run virtual_adrian.se to tune your system. Use the right tools and don't just add more memory, identify the bottleneck, fix the bottleneck, re-test and repeat until the performance is satisfactory.
A few things that may help;
1) Our POP mail server (~1000 users) running on an old SUN Solaris machine (LX) was having problems because of the number of NIS lookups that were going on. System CPU was up near 75% constantly, I/O waits near 0, and load was also very high. Solution; make mail server a NIS slave as opposed to a NIS client. Reduced load by 20% immediately. Same goes for DNS lookups.
2) Make sure you're not writing/reading to/from NFS mounted fs.
3) Install rec. Solaris patches - these can make a big difference. Try installing Virtual Adrian, and see what it reccommends.
5) Don't buy EIDE for all the reasons mentioned previously. For lots of simultaneous hits, SCSI outperforms EIDE every time.
6) Consider fibre channel disk arrays from SUN - expensive but they are nice especially the new A5200. Give 22 spindels as opposed to the 14 in the A5100.
7) Ignore the guys talking about s/w RAID solutions being a BIG slowdown. Sure h/w RAID 5 is much faster than the s/w equivalent but when it comes to RAID 0+1 then there ain't a lot of difference. Not only that BUT s/w RAID systems tend to be much easier to configure and maintain w/o a doubt - check out Veritas Volume Manager (love it!) or even the free DiskSuite (with Sun Solaris server version) is better than any h/w RAID configuration I've seen.
8) I would bet my next salary that adding a RAID system to your mail server will increase performance by less than 15%.
Oh, and I've been managing enterprise level Sun systems now for 8 years, so I'm not just a Linux geek who has read too much ;)
Hope this helps.
This is my HD:
/dev/hda1 486M 358M 102M 78% /
/dev/hda2 3.8G 2.7G 909M 75% /usr
/dev/hda3 964M 501M 413M 55% /home
/dev/hda5 99M 20k 94M 0% /tmp /usr at 100%, and /home at 100%. I have a 4.3 gig HD laying around which I had FreeBSD on for awhile (been thinking aoubt putting BeOS on it) but I may use this idea and go for it.
Filesystem Size Used Avail Use% Mounted on
and that's AFTER cleaning out... before I had / at 100%,
If you think you know what the hell is really going on you're probably full of shit.
If you think you know what the hell is really going on you're probably full of shit.
jdube is who I am.
I would not use RAID for the problem you're describing. You're most probably better off splitting the box into several others.
For example, try using a fallback mailhost for outgoing mail (fallback_mx in Sendmail). That way messages that cannot be delivered within a couple of seconds are relayed to the fallback server, keeping your outqueue clean and tidy.
For incoming mail, use a different server, or if you can, use several. You could just put them all in the MX list of your domain, with the same priority. This does wonders.
It might be smart to look at the mailbox format. Some mailbox formats (MBX) have much better performance than others. And you could put POP3 and IMAP on a third server.
All this is much preferable to simply installing a RAID array, IMO, based on the information you presented.
Used to work for Data General, the parent company. Fantastic hardware. They've just been bought by EMC though.
I would definitely try to tune the system before throwing hardware at it though. Find out exactly where the bottleneck is.
Deleted
I ran into the same problem not long ago. Our local ISP needed a backup solution. The old tape drives were not doing their job anymore. But, we built our own RAID cabinet. We bought a 8 disk RAID enclosure with dual redundant power supplie from Siliconrax. The controler is a Mylex External RAID controller. The card is nice, it allows expandablity down the line. The card comes in a full height enclosure (keep it in mind, its big). We used 18.6 gig Seagate drives in the system. Each drive was mounted in a CRU Data Port removable enclosure for hot swap. RAID controller has a LCD front panel making setup a snap. The array was configured with RAID 5. RAID 5 is redundant, and provides fast read access, but write access is slower. All in all, the the array is about 100gig online. It cabinet is connected to a SGI O2. The only thing to watch is the cable length!! We've been doing nightly backups over NFS since the array was turned up. The system is nice. Go SCSI, and go the research on the proper controller. If the money is there, go fiber.
Who would believe in penguins,unless he had seen them? Conor O Brien - Across Three Oceans
having read through most of the the thread, my $0.02 is:
:-/
;-) will be more complex and your backup system might need some work too.
definitely install virtual adrian to get a better
idea of system tuning you can do and where your
real problems might lie. have you tuned all the system paramaters possible ? ncsize ? turned off
all non essential daemons/apps on the machine ?
mylex controllers seem reliable but were definitely a pain to configure - we're using them on a dec fileserver solution. one downside that appeared was they took 6-8 hours to initialize the array - compared to 1.5 hours for a non mylex controller
we're now switching from DEC+Mylex to Sun+Infortrend who make a very nice scsi-scsi controller. www.infortrend.com - we're using the 3201U2G - 4 Ultra2Wide scsi buses.
don't go to raid unless you know what you're getting yourself into - it's far more complex and expensive in the long term apart from your initial investment in the hardware. you'll have larger spares provisioning, your documentation (you do have some right
my rule of thumb at present is JBOD to 50G, RAID
as a NAS for 50G-500G and SAN (RAID/fibre) for above 500G. you really don't need raid below 50G except for specific performance reasons
it's been an interesting thread to read, since i'm
right in the middle of working on a raid5 server implementation.
-jason
First of all you might want to check out other MTAs, as well as other methods for storing the user's mails. If all mailboxes reside in the same directory, you're spending all your time in the kernel doing _linear_ searches thru the mailbox directory. You could spend millions on EMC hardware without seeing _any_ performance increase.
I'd recommend using the Postfix MTA, as it has almost all features of Sendmail, and it's secure, and (hold on) it's even faster than QMail. Eventually you could use it with the Cyrus IMAP/POP services. You definitely want to make sure that you don't have all mailboxes in the same directory. Build a hierarchial structure where you never have more than say 30-50 subdirectories/files in one directory.
Ok, if disks are still your problem, consider:
1) Software RAID is usually a lot faster than hardware RAID. And for the money you save on the HW controller you could buy faster/more disks.
2) An IDE disk is identical to an SCSI one, except of course for the interface and the warranty. The price difference is mainly due to the warranty.
3) UDMA/ATA-{33,66} IDE interfaces are as fast as any SCSI solution if you keep _one_ disk per channel. The main problems with IDE solutions is the short cable length allowed (a problem for 10+ disks) and the number of controllers you must have (one controller for each two disks)
You can spend $50K on a SCSI/HW-RAID solution easily. And you won't know if you'll even get the speed of one single UDMA drive from it (yes people actually get 15MB/s both from their single UDMA drives, and from their expensive DPT RAID solutions). At least consider a software-RAID and eventually IDE solution before rushing out to spend the next 10 years budget on the shiny HW-RAID solution.
Your setup is fairly small, eg. you would probably do just fine with a four-disk RAID-5/10 for spool and mailboxes. This is where SW RAID is worth considering. Granted, for 20+ disk systems, HW RAID may well be a better way to go, eventually combined with SW RAID.
My 0.02 Euro.
Many have posted followups here mentioning that RAID 5 may not be your best avenue. To recap, this is because of the performance overhead associated with the calculation of parity data. Unless you have a reliability issue, RAID 5 is probably something to stay away from. An exception might be hardware RAID, but such solutions are expensive and will still involve a slight performance hit.
The multi-controller solution is probably best; someone mentioned the Sun StorEDGE product with the Cheetah drives. This is a great piece of gear, and coupled with some really good storage management software (might I suggest Veritas Software's File System/Volume Manager) you'll get a very flexible solution providing the most bang for the least buck. With the Veritas product you can manage the data on the fly over several drives, and monitor & tweak the configuration on the fly while in a production capacity; additionally, the Veritas product provides a journalled filesystem which will allow rapid restarts in the event of a crash and if you have the drives, can be configured to fail over to available spares.
Yes I am a Veritas Consultant =^) but that does not change the fact that this is an excellent product that would probably go a long way towards addressing your issues (which seem more performance oriented than reliability related) on your existing drives. Check out this link for more info: http://www.veritas.com/library/su/fsconceptwp.pdf
Good Luck!
-Videoranger
Heaven offers little comfort like winamp and a big disk full of Dave Matthews MP3s
I do sys admin for a software company with a mixed Unix-NT enviroment. We had some terrible experience with Samba on Unix, and NFS on NT. About a year ago, we purchaced an F720 with 100GB, for around $50,000. Now we have another F720 with a 300GB fibre-channel RAID.We talked with other NetApp customers, and they were extatic about the reliability of these machines. Although I can't say that the filer was %100 reliable, like we heard from smaller sites, we're VERY satisfied with it's performance. In the last year, we've only had 2 occasions with signifficant (> 10 minutes) downtime. As far as speed is concerned: it's usually faster than our local disks... :-)
One of the best things about it is it's simplicity. GUI people use the nice Java applet to control it (it get's better with every release of the OS), and us Unix people have a great command line interface.
If you plan to use the NetApp with lots of clients (about 500 in our case) in a mixed enviroment, the Netword Appliance is probably the most reliable and simple to maintain solution. If you want the fastest RAID array to connect to your mail server, it will simply amaze you
If your budget allows, got for it!
Now from all of my research it seemed like NetApp was the way to go. So I pushed and pushed and pushed, and finally we got a F760. (Nothing like going from nothing to the top of the ladder) And now it is 2.5 months into being a NetApp user. Both the 1 and 2 month aniversaries were marked with a MB dieing. I must say it is fast, real fast, but right now the analogy is fast like a race car going towards a wall. Now ease of use, maintainence, etc on the UNIX side has been pretty carefree for me. The NetApp has been very easy to use, easy to monitor, and easy to setup. But the NT department which paid for half of it is hating life. The NetApp's quota system is straight out of unix which is not good for NT, i.e. you are putting quota's on users, groups, or qtree's (Think root level directories which are made in a special way). According to the NT guru's file ownership by individual's in NT is a bad idea, therefore all files are owned by an administrator equivalent. This means you lose user quotas. NT has a different group philosophy than unix (multiple groups can have access to a single file) so I am guessing the group quota's are out as well. Leaving qtree's, which are sort of ugly. Right now our NT people are looking at taking the loss on the NetApp and giving it to UNIX (Fine by me ;) and replacing it with a conventional NT file server. Another downside for the NT side of things is that the NetApp's is configured much like a UNIX box. It uses init and rc files etc etc. Well from NT land there is a carriage return/line feed issue. All of those files have Unix style carriage return/line feeds. I am not sure if they break if you start using dos style but I am leary to find out. Which means the Unix side is resonsible for all configuration of the NetApp. This is both good and bad. They aren't going to break my stuff, but I have to take on additional labour. Note: The hardware failures were quickly resolved by NetApp, but it still sucked hard. The NT quota issues are supposed to be resolved in the next major version of the NetAppOS codenamed Guiness or some such. The NT people IMO haven't fully explored the quota possibilities instead taking the partyline that it's too much work. And it is entirely possible that I have not uncovered all of the problem's and solution's for those problems in the time we have had it.
Why is everyone soo obsessed with RAID5. It is not the holy grail of disk storage as one or two others have tried to point out but been flamed for. Raid5 offers great resilience, BUT is not good if performance is also required. Just because your data is striped across multiple volumes to aid recovery, it still only reads from the one volume, and the need to perform the stripping on writing makes the system slower. If performance is an issue, and money is not, then RAID1 (mirroring) is the solution (unless your system will allow both RAID0+1 (IBM RS6000's, my domain, do not)
Writes shouldn't take significantly longer than reads. I work with Fibre Channel, and the throughput numbers I get for raw reads and writes (no file systems) aren't significantly different. If you have a good raid controller, it should be able to keep the drives busy on both reads and writes as long as the file system is writing data in large enough blocks.
>1) Software RAID is usually a lot faster than >hardware RAID. And for the money you save on the >HW controller you could buy faster/more
:-)
>disks.
Since when? I've been working on servers with and without RAID for ten years now, and this is the first time I've EVER seen this claim. Was that a typo? Hardware RAID is much faster usually, as well as more reliable. Yes, it can be harder to set up, but in the end it is well worth it. Remember, you get what you pay for. Any time you use software to do a job that hardware can handle, you are devoting CPU cycles to it. Properly designed RAID controllers offset a ton of processing that would otherwise be done by the host CPU. They don't put RISC processors on RAID controllers just for show
As for SCSI controllers, I'll echo what others here have said. Mylex is one of the best. Not the easiest to config, but by far one of the fastest and most reliable controllers out there.
> It's unbelievable how many people are confused over this.
Yes, it is. There are still people who recommend SCSI without further investigation.
> For example, let's say your system is trying to read data and do a write at the same time.
No decent OS would do that. It would concentrate on reads and save the writes for later, unless the write cache is full.
> With IDE your OS has to issue one command to the controller which passes it to the device and then waits...
With IDE maybe. With ATA not. ATA does have everything that SCSI has, and more. Read the specs at www.t13.org.
> With SCSI, the OS tells the controller all the operations it wants to do and the controller looks at it and decides if there is an optimal way of doing the commands.
Of course, only if you have a host adapter / driver which support command queueing, and an application that _does_ do multiple accesses at the same time. Most don't. And a decent OS reorders the commands anyway before they are sent to disk, partly eliminating the need for reordering by the drive.
SAN is an ill-defined acronym that everyh vendor defines differently. The idea selling SAN is that you have a large centralized storage center that offers it's disks/volumes to all connected clients w/o the hassle of administrating a disk subsystem on each server.
The problem is that each vendor implements this differently, and has a different definition of what a SAN should be. None have really addressed the complex issues, instead implementing the kind of hack you describe - NFS with a data channel over FCAL. You still have the problems of NFS to contend with (no reliable locking, consistant transactional guarantees in client and server implementations, etc.). Heck, most vendors are selling FCAL HUBS instead of SWITCHES to accomplish this storage sharing because the switches aren't prepared to do TCP/IP over fiber!
Ideally a SAN would be a well fleshed-out spec that allows massive amounts of storage to be conveniently accessed accross a network with all of the guarantees of a local disk. That's how it's being sold. However, right now it's looking like little more then a way to get NFS to run faster.
-Peter
== Just my opinion(s)
Solaris' filesystem prior to the logging filesystem in 2.7 is a dog. I'd highly recommend that you benchmark your performance w/ Veritas' vxfs, or w/ solaris 7 before you buy a raid system.
Also, if you do get a RAID, I'd highly recommend a box that does not get controlled in software, i.e. Solstice DiskSuite or Veritas Volume manager (I love veritas' VM, but as a raid controller it lacks intellegence).
A good external box with hot-swappable drives and a sizeable write-back cache (w/ a battery!) is my favorite way to do this stuff.
== Just my opinion(s)
I work for a Systems Integrator-nice word for RESELLER! We are a Sun reseller first and formeost, but we are very strong in the NetApp arena. Since I am a geek trapped in the hell of being a sales(wo)man, please forgive me if I sound salesy at all....
Anyway, NetApp's are a great solution for multiprotocol storage. One of the drawbacks is that it is Network attached and therefore only as fast as your network...which has been a problem for many of our customers. Another HUGE problem is backup. There is only one product that can do it well-a product called BudTool. BudTool is a little guy that some geeks in my company thought up and brought to market, then along came NetApp who asked us to figure out a way to b/u their filers. Out of that venture NDMP was born. BudTool is the only product that makes use of NDMP correctly. That divison of my company was recently sold to Legato systems, who plans to EOL that product. NetApp is now scrambling to find another solution, since they've been recommeding BudTool from Jump Street....
Pricing is also an issue. And you were right in saying that they start at aroung $17K, but that is WITHOUT storage. A good sized storage solution, let's say 1 TB is going to run you upwards of $100K. Yikes.
There is also a good resource for people who are thinking of deploying a NetApp solution, which is the toasters users group. You can send an e mail to toasters@mathworks.com and ask to subsrcibe to the group. You'll get alot of good feed back on what works, and what doesn't. You'll also get to see the downside to using it (and BudTool). I think there is info about the group at http://teaparty.mathworks.com but i haven't been able to get there in a few.....Check it out. It's definitely worth the trip.
And if you need any quotes I'd love to help you out!!! Just Joking
"Most of my heros won't appear on no stamps..." Chuck D from Fight the Power
That is simply not true. Reads in RAID 5 occur from all volumes where a stripe resides. A file never exists on a single volume in RAID 5 unless it is smaller than the stripe size.
Free Mac Mini. Yes, I'm
We've been running a SCSI/IDE RAID system here for some months now. They're actually a pretty decent idea - the array presents itself as a Wide-SCSI device, but drives 6 IDE HDD's over 3 IDE Busses. There's 128Mb of cache in the box, too, so it feels pretty snappy (although I've got no hard figures on it's performance).
The real bonus, of course, is that it's dead cheap, compared to equivalent all-SCSI solutions.
I should probably say that we've only got it running on an NT file server at the moment, so I can't vouch for it's performance on a big scary mail server, but it's working well for us. Certainly, it seems to deal OK with everything we want from it (RAID-5, plus a hot-spare). It deals just fine with you disconnecting a drive while it's running, and simply gets on with re-building onto the hot spare (hardly a scientific measure of it's usefulness I know, but certainly handy for demonstrating to PHB's why they should like it *grin*).
Sorry if I offend but damn it I hate when people get fed misinformation... and I am grumpy_geek
Hardware is MUCH, MUCH faster than software, we've got boxes here with 2 gig of cache in the raid controller because we can't spin the disks fast enough (of course we've got terabytes of data). Hardware will allways be faster than software for the sole fact of cacheing, you may never need it, you may not do enough I/O to have to wait on disks, but just because you don't use it doesn't mean it's slower.
SCSI disks vs. IDE you really don't know what you are talking about do you. How many simultaneous I/O operations can you do on IDE???? IDE you do each operation in a serial fashion, means one I/O op holds up the rest, not a big deal for a workstation but for any multi-user situation (or better defined multiple simultaneous I/O ops) SCSI is required. Warranty? I don't even want to think how you came up with that one, or why you would think it even applies. I'll add my own one here, and this is a biggee... I don't know of any IDE HA solutions, I guess that would be because you can't share drives with IDE. Of course there is hardly any difference between SCSI & IDE.
People can get 15mb off of one drive doing large writes (writing a single 5 gig file), but you will NEVER get that performance using one drive on any type of random access information. Did you really think about what you were saying about putting in a controller PER DRIVE for the UDMA... so to get the same performance as 7 SCSI drives one a contoller, I have to add 7 additional controllers into the picture... how many open expansion slots do you have in your box today?
All MTAs that are halfway reliable are disk bound (*not* network bound) - I believe that Wietse has some information on this in the postfix data.
This is because each message is fully commited to disk as it comes in (for exim this means opening (creating) writing, closing and flushing 2 files, other MTAs differ slightly), and then a reliable local delivery costs about the same.
Hence what you need to optomise is the latency of synchronous operations. So I would strongly recommend some form of RAID with NVRAM cache which means the commit time is memory speed rather than disk seek related.
There are some other possibilities to consider; for example, do your queues contain many deferred items (due to note unreachability or DNS problems)? If so, you are trawling through a large mail queue to process a small number of items (since some of the items are not due to be reried yet [I'm no exim expert]).
/etc/syslog.conf to make syslogd avoid calling fsync() on the mail log for every single logged message.
If you're using a lot of disk in re-reading the queue, consider using an MTA which has a separate queue for deferred items, and/or a hashed directory structure. The Postfix mailer (by Wietse Venema) fits the bill here. Postfix is particularly good for large queues.
Postfix also is deliberately written to make filesystem accesses an absolute minimum of times for each item of mail (I think you can have as few as 3 disk accesses per item). This really reduces disk loading, especially on systems with synchronous filesystems.
On the RAID side, consider alternatives to RAID5. RAID0+1, for example, is as safe as RAID5, but faster (though it uses slightly more disk drives).
What is the balance between writes and reads on your mail server?
Are you logging syslog locally on the mail server? If so, consider either moving syslog logging to a dedicated log box. If you can't do that, consider using the leading-dash feature in
Reliability Is the issue when it comes to email, and raid systems. Ofcourse Sun has the edge, so why not stick with Sun Software & hardware. The sun StorEdge A1000 has a caching controller and usually 30-40 gigs per rack, it plugs into your SCSI Bus, and you can simply add another Dual Channel Scsi card to split the load or add redudancy.
Network Appliances makes an Excellent Solution. NFS Toasters are the way to go in a distributed environment. Say you have customer on a shell account, well you can export the mail directory and mount it VIA NFS and access it from the shell servers without throwing more email load on them locally. NFS Toasters come in a great looking appliance rackmount case, and depending on how much storage you need, is how much rackspace you need.
And ofcourse there is StorageTek, which will run you a pretty penny, but offers Fibre Channel, or Multiple SCSI channel connections, full redundancy, caching, hotswap and maintenance features.
I'd never stick and IDE solution on a production box, You need something that you can get support on and Services on, so i'd suggest that you stick with the Sun StorEdge A1000 drive systems for complete compatibility and put it under the same Support contract as your UltraSparcl
AND
As far as email is concerned, you should setup an MX server to cache and forward incoming email, these work real nice since you can run RBL or pre-process out spam without killing the actuall server that holds and processes email for incoming clients. You have to look at a distributed environment, as email is precious to alot of people, and a single server machine is not gonna cut it when your upwards to 20,000 customers doing that much email.
PS. Try out Qmail too :) smaller footprint!
My guess is that in this role, performance is not the paramount issue. You're not bopping the heads around like you would in a database application; and even 20MB/sec is going to be a plenty of throughput unless you have banks of ADSL lines. The important issues are reliability and maintainability.
I'm as much of a tinkerer as anybody; for my own use I don't mind spending two bucks of labor to svae one buck of investment, because I'm really investing in myself. That said, if I had 13K users depending on me for e-mail, I wouldn't mess around; two days of down time could be fatal for your business.
I'd invest $1.50-$2.00/user in a professional grade solution:
Hardware SCSI raid controller.
Drives on hot swap trays.
Same/next day on-site service contract.
External cabinet that can be swapped over to another computer.
It's been over two years since I spec'd a solution like this one (I'm doing software exclusively these days), so I can't make a specific recommendation for today's hardware. I know that some devices used to come in a separate cabinet and looked like a humungous SCSI drive; they even had their own RJ-11 to hook up to a phone line for remote diagnostics from the vendor's tech support.
If the money to swing this is impossible, then I'd recommend mirroring rather than RAID 5. All these kinds of things are compromises between reliability, cost, convenience and performance. RAID 5 is an excellent overall solution from a performance standpoint; but if you cannot afford this RAID 1 is a good choice. It offers fast reads at the cost of slow writes and survival from failure on either disk. In this application, users won't be affected by slightly slower write times. Since drives are so incredibly cheap these days, I'd say this is a pretty good choice if you are strapped for cash. You could even use IDE drives. If you could afford a second IDE controller, then you could use software mirroring across two different controllers for improved throughput.
One thing I haven't looked into is RAID-2; RAID-2 is like RAID-1 with additional error correction codes. It is seldom used in SCSI because SCSI does this for you, but it might be worth looking into for IDE raids.
Good luck.
Really what would be great is failover clustering.
Post may contain irony: discontinue use if experiencing mood swings, nausea or elevated blood pressure.
Much of this is probably repeated elsewhere, and much is common sense, but...
1. When was the last time you defragged the drives? Chances are this will reduce thrashing immediately.
2. Add more memory. More cache == less I/O. Double the RAM for a week and see how much better things are...
3. Hardware RAID is the only RAID. In most cases, the overhead of s/w RAID exceedes the I/O performance increase. Plus, the OS (whatever OS) need never know the boot drive is spread across 5 drives is three racks...
4. Hot Swap is a must for a production environment. Nothing beats the warm feeling of yanking a dead drive, slapping in a new one, and watching it get rebuilt on the fly - and the users never know...
5. Any amount of RAID will still fail badly if the PSU dies - always get redundant, hot swap power supplies.
6. The same goes for cabling.
This sig left unintentionally blank.
At least some POP servers are reputed to do stupid stuff like copying a user's whole mailbox to a new file every time a user connects up, looks at headers, or deletes a message. While I don't have specific recommendations, I'd advise auditioning a few different packages to see what kind of I/O load they place on a disk farm. Also, you may be better off spreading your load across multiple (cheap) servers rather than putting all your eggs in one expensive basket.
Also, improperly tuned RAID-5-based systems can be slower than the disks they're built out of because of the need to do read-modify-write cycles to update the parity blocks..
Does anyone know of a FreeBSD HW RAID controller. I can't find clear statements about which cards are supported. I guess I may use Vinum. Can someone help?
I've seen in the computergate catalog (www.computergate.com) they have a card that mirrors your hard drive to another (raid 0) for $85 or a 4 drive IDE raid card for $120. I think Promise makes some IDE raid cards under $200 also...
My choice for a large mail server (30,000+ users) is CMD 5640 Dual, Hotswap RAID controllers with 256MB cache each, in an external cabinet, with many drives. The controller has 2 host channels and 2 drive channels. I also use Kingston DE300 hotswap trays, which let you put 3 1-inch drives in the space of 2 devices. I would go 6 drives and put 3 on each drive channel. Depending on how much space you need, you can use 9 or 18GB drives. One nice thing about the controller, is that it is separate from the system, so if the system crashes, you still have access to the RAID controller to troubleshoot problems. The controllers have serial console ports so you can access them the same way you would a headless sparc server. The controllers aren't cheap (~$6600), but well worth it. The Ultra2 I have with dual 400's, 2GB ram, and this setup with 50GB drives should easily handle 100,000 users. If the price is too high, you can get non-redundant controllers (CMD 54xx series) for alot less. (~$2200 with 32MB cache)
Another thing to think about is what software you are running. I was running qpopper on a server that continually had a load of 15+. after switching to cucipop, the load went to 0.15.
feel free to mail me if you have questions or want more info... I can hook you up with the vendor I order from.
-Randy
Are you kidding me? Not only does Dell's phone support rock, they're online support is even better!!
Here, I found the information for you, and it took me all of about 1 minute.
http://support.dell.com/docs/dt a/4XXLV/00000001.htm
(1) SCSI - EIDE - _BAD_IDEA_. I'm not quite sure if you're familiar with SCSI and IDE's physical performance attrobutes, but if you are experiencing any bottleneck issues whatsoever with SCSI, IDE, even EIDE is possibly the worst thing to do in this situation.
(2) You ought to make the point that you're looking for Sun-box stuff, which is *way* more confined than PC RAID. We are running Exchange (no flames) and we use mirrored RAID 5 on two seperae controllers- we can have any two drives fail simultaneously with no repercussions. My point is that the PC RAID market seems to have far more choices for you.
3. RAID on IDE works fine, even great for a desktop user who pushes the perfomance envelope- but (E)IDE cannot and will likely never compare to UW-SCSI or U2W-SCSI.
NP
Hear the voice of reason! RAID 5 is good for reads but you suffer a big performance hit for writes. _If_ you can guarantee that all writes are the same size (as with some database servers) then you can tune the stripe size, but I don't think you can guarantee this with a mail server.
/etc/system and it helps a lot on systems that are accessing a lot of files (NFS is the classic application for this but I think a mail server will benefit too). Check docs.sun.com for how to do this.
And when a disk fails you take a performance hit on reads too - better for your stress level to pick another mirrored RAID level that means you don't have to panic quite so much when this happens.
I would recommend RAID 0+1. I have seen decent performance with Online Disksuite (software RAID) - even if you don't want to run this in production it would give you a chance to try things out without spending too much cash.
The news server application mentioned may suit RAID 5 because a) you have to store a massive amount of data, b) most accesses are READs (people browsing news rather than posting).
For email, you are going to get a good mix of reads and writes rather than just reads. In fact I think you'll find the application is attribute intensive than anything.
There's one thing that you can do for free: increase the DNLC (directory name lookup cache). You do this via
From a long ago research on that SGI bit, SGI does a bit of a tweak on the Clariion drives, . I got this info from Alexis Cousin (sp?) from SGI's Europe office, when I was trying to find a HA solution for my SGI's a year or two ago. Supposedly the box won't detect a failure in a drive path and failover to the second controller unless you are using their OEM'd Clariion.
SGI loves to mess with stuff and OEM it, they dinked with the Netscape Servers, dinked with Clarrion, they try to do it whenever possible. I talked with a Veritas guy at a Lisa convention and he said SGI talked with them when Veritas was first starting, but would only except an OEM version of it. Supposedly it really pissed off the Veritas head guy so much, that the Sales guy said we would probably never have a solution from them for SGI... of course I now have heard rumblings of SGI & Veritas doing some colaborating these days on some Linux devel; so I guess all bets are off anymore.
Not raid at all but just an online backup:
dd if=/dev/sda of=/dev/sdb bs=1048576 >> $LOGFILE
This assumes identical geometries. So buy 2 drives instead of 1. Use it once a week or every night. This has saved my ass countless times. Every box I build gets a dupdrive script containing the dd command above and a spare drive.
Adults are obsolete children. - Dr. Seuss
As lots of people have said, the disks and raid setup can be a problem. Spend some time with vmstat and iostat and determine where the bottleneck is. If you have a throughput problem, you want more controllers in the mix. If your spindle bound, you want more disk. However, I didn't see mention of what type of filesystem your using. I imagine with a mail server that you have thouosands of tiny little files spread across only a few directories. For that situation, it's rather critical to put a filesystem that does binary lookups of your metadata (Such as Veritas). use vsar to look up your inode hits and misses and if the ratio is out of whack, try to break things down to fewer files.
We run RAID on over 80 developemnt servers and 20 production servers. We run NT and MS SQL 7.0, but also do things like bill generation which involves alot of raw file access. Currenty our best setup runs like this (We use HP Intel hardware):
We have 2 raid controllers (each has 3 channels, but you won't need that much for your setup) running a RAID 10 arrray on each. RAID 10 is about the best performace you can get out of RAID. Basicly the idea here is that most RAID controlers are A) Slow B) Can only handle so many I/0's per second, and thats always slower then a modern system can handle.
If you don't know what RAID 10 is, basicly you have 2 or more mirrored drive sets. Then you stripe across those drive sets. This means you A) Need atleast 4 drives B) You lose 1/2 of your useable drive space in the mirroring. But this also means you can do 2 seperate reads across 2 different sets of striped disks which is very speedy (In theory, anyway)
So if you spread your spool and mail across 2 raid controlers running RAID 10, thats probably the best performance your going to get. 10,000 RPM drives will help alot too. The only problem, this is also the most expensive way to do it.... ohh well....
-Tripp
you mentioned ide raid but not by name. i've been looking at raidzone's solution. haven't bought yet, but it does all the hot swapping stuff you want *and* is riding the ide cost curve, which is now at 20G/$200.
the interface is neither ide nor scsi, but rather a board in your pci bus.
oh, right, you have an ultra sparc. *LOSE THE ULTRA SPARC*! they are not fast. you're better off running linux or freebsd on an x86 farm or beowulf cluster.
You'll probably want to investigate whether or not it's your disk I/O that's actually causing your problem. If it is, (and I know I'm going to look like the antichrist of /. because I reccomend this) you may want to look into the Sun Storage Solutions since you made the right decision to get a Sun in the first place. http://www.sun.com/storage/disk.html The MultiPack (http://www.sun.com/storage/multipack/) works very well. The disk I/O speed is plenty for a fairly heavily used Oracle server we have.
> > It's unbelievable how many people are confused over this.
>
> Yes, it is. There are still people who recommend SCSI without further
> investigation.
>
> > For example, let's say your system is trying to read data and do a write at
> > the same time.
>
> No decent OS would do that. It would concentrate on reads and save the writes
> for later, unless the write cache is full.
Unless you are doing a syslog operation, as most MTAs do, which syncs the disk.
You can disable this in syslog.conf, of course, the biggest performance win
I have seen for most mail systems.
>
> > With IDE your OS has to issue one command to the controller which passes it
> > to the device and then waits...
>
> With IDE maybe. With ATA not. ATA does have everything that SCSI has, and
> more. Read the specs at www.t13.org.
Specs are specs. Real-world implementations? Widely available controllers?
Systems with this in standard? Drivers that match? For multiple OSes?
If you have the money to spend, I recommend talking with Network Appliance. (www.netapp.com) They have some VERY nice storage hardware, and it is everything you wanted. Fast, scalable (up to 1.4 terabytes currently). We have a small 7, 9gig drive solution currently, and it's a dream come true. Fast, reliable, you name it.
There is really not so much that differentiates ATA from SCSI anymore.
;-)
I wouldn't go that far.
Yes, IDE has finally caught on to such things as DMA and busmastering, and throughput on IDE devices is in the same arena as SCSI now. But.
IDE is limited to two devices per bus, and generally requires one IRQ per bus. IDE also has very strict and short cable length limits, and lack a "external" connector -- you generally can't have an external IDE device (I know is is possible, but the cable restrictions make it very difficult).
There are more kinds of devices (scanners, printers, etc.) available for SCSI then IDE. SCSI is generally more capable in terms of what you can do with it.
IDE controllers tend to be very primitive compared to their SCSI counterparts. Things like bus disconnect, command queuing, scatter-gather, even busmastering are often not available or iffy on IDE controllers. This applies especially to the onboard controllers in many motherboards; the number of shortcuts taken there are incredible.
Likewise, the drive electronics and HDA components in IDE drives are often cheaper then those in SCSI drives. These are all design and engineering issues, not issues with the specification itself, but they exist. The problems stem from the fact that IDE is marketed to be cheap, cheap, cheap, and thus gets are higher incidence of cheap components. It isn't limited to IDE, either -- you can also find cheap SCSI hardware, it is just that there is less of it.
IDE often appears faster in benchmarks, because benchmarks typically try to do operations in bulk on a single device. IDE has a lower command overhead then SCSI, so for such things, IDE will be faster. But when you get into the real world, and have multiple processes trying to access multiple devices at once, that is when IDE stalls, while SCSI keeps on going.
I realize this started off as a discussion about RAID, and that IDE RAID devices are not your typical RAID devices. They usually have one drive per bus, connected to a custom controller that multiplexes them all and presents them to the host as a SCSI interface. But the topic has drifted to more general applications.
Just my 1/4 of a byte.
dragonhawk@iname.microsoft.com
I do not like Microsoft. Remove them from my email address.
This is a quote from an Eric Allman interview on sendmail.net:
Are there features in sendmail that people should be aware of but aren't?
Oh, there are probably dozens of them. One that comes to mind, a very simple one, is the fallback MX option, which lets you redirect mail that has failed the first time to another location. It essentially acts as a lowest possible priority MX record for all hosts. For example, if you've got a mail system that's got a lot of traffic going through it, you have another machine that you dedicate to the slow mail, the stuff that didn't go through the first time, where presumably you're less concerned about how quickly it goes because the other end's being slow. So you set your initial connection timeout to something low - five seconds, ten seconds, whatever's right for your site - and you set the fallback MX on your main site to this fallback host. That way the mail that's going to go through quickly just goes fsssssssst right through your main server, while the stuff that's going to be slow (because the other end is either slow to connect or down) goes off to this other machine and doesn't clog up the main machine. It turns out to be just an amazing win. And these days the price of a PC box running FreeBSD or Linux is close enough to zero that it might as well be zero, so it's not really a problem to do it.
I have asked their tech support if it would work with linux about two weeks ago, and they replied that they would come out with a driver in about one month... Maybe there`s a *bsd driver coming, too?
We're planning on deploying 2 database servers accessing data off of one external disk array. The second would be a failover server, so they shouldn't be concurrently accessing the same data/partition, but could. I know multiple boxes can access a single disk array through one scsi bus, but everybody always talks about them using different partitions. Can you have 2 boxes access one partition on SCSI? Fiber?
What form of RAID would be best? 5? 0+1? I almost wish I could to a 5+1 - Stripe with parity mirrored. I know that's alittle paranoid, but so am I... :-)
We're looking into the Gateway and Dell disk arrays. Has anyone heard good/bad about these? They have max of 8 disks, what would be the bets configuration?
Thanks,
Jeremy
Jeremy
"Opinions are like assholes; everyone's got one..."
at CSC.. I'm not an employee, just someone who bought a 4-tape DAT autoloader for $269 from them and is quite happy with it..
Your Working Boy,
I can give you some input on #1. The main (and only) advantage to using IDE over SCSI is price. I have a 70GB (4x17.2GB Maxtor) UDMA Raid0 running on a server at home. It cost me only about $700 to build it. It is running on a Promise Ultra66 controller. I have run raid on it under both Linux and WinNT and it works great. Disk performance is actually very impressive, much faster than a normal IDE drive, but that is to be expected when you stripe 4 UDMA drives. It is still nowhere even close to the speed of a good SCSI Raid setup.
I'm always amused at the large number of people who immediately think that because you are placing IDE/UDMA drives in a Raid configuration it will cause the drives to die quickly. That's bullshit. Granted, the SCSI drives will last you a hell of a lot longer, but IDE won't keel over and die just because it is Raided and under a high performance load. Most IDE drives will last at least for the length of their warranty period. Make sure you get the 'SMART' enabled drives and some monitoring software to give you a heads up if the drives begin to exhibit signs of failure.
If you want reliability and speed and are willing to pay for it, use SCSI. If you want large amounts of space and average speed at a decent price use UDMA. My needs run to cheap space and lots of it, and so far the UDMA solution has worked well for me.
I wouldn't recommend anything but a SCSI solution to you for any situation where you are looking for high performance fault tolerant systems. In your case I would go with option number two in your post above, and option three only if you are really, really worried about losing your data.
PS - This is running as a software Raid0, there is no hardware present. I have seen a number of benchmarks (some from Ars-Technica, don't have the link) that claim the performance of the Promise raid controllers is exactly the same as a software raid. I'm not sure if their competitors have this problem, or even if they have any competitors in this area.
Hell is being intelligent in a world full of idiots.
For precious data, I agree that Raid-0 is insane.
but for temporary data that doesn't need to be 100% sure to be kept, that can be a good solution.
For a news server spool, if it crash replace the disk and the standard NNTP messages will refill the news spool shortly (depending of the connection speed)...
Of course OS disk and what users send and other data that is more important should not be on Raid-0...
The reason for this is that RAID-1 uses 1:1 mirroring of a 2-drive set while RAID-5 uses rotating parity in which parity information is distributed across all drives.
With regard to space, using RAID-1, your usable yield (what shows up in df) is half of the total disk space put into it. With RAID-5, parity info is spread througout all the drives. Eg., I have a RAID-5 using four 4GB drives, which gives me 12GB of usable space. With 0+1 on this configuration, it would be 8GB usable.
As for speed, both RAID-1 and RAID-5 allow you to read from multiple disks at once (which, of course, is a win). For writes, a drive pair in a RAID-1 will take as long as a write to a single drive. On RAID-5, however, it takes longer because (afaik) the RAID controller has to determine which drives to write the parity info to, which takes CPU time.
A decent little overview is at DPT's site (sadly, only in PDF) at http://www.dpt.com/pdf/understand_raid.pdf
That's exactly the reason that made me play with the Linux software raid. Basically I've revived an old 486dx2-50, and stuffed it with several HDs, both IDE and SCSI (from dead Mac IIcx's), that I've combined in a raid0 array. Running fine for several months now. :)
If you're interested in details about the setup, mail me after removing the obvious
raid configurations tend to imply huge virtual drives. huge drives need a loooong time for a filesystem check (once i had a >3h one with a 72GB drive/raid 5). therefore i would highly recommend a log structured filesystem!!!!!!
...
the gdt controller (http://www.icp-vortex.com/) works fine with linux (and of course any other operating system, linux tools for i386/alpha available).
about raid modes: security: mirror, one drive security: raid 5, speed: striping - these are the common uses, but the choice, depends on your needs
CU
SCSI drives have a 5 year warrantee
:)
IDE drives are 3 year (at least, I haven't seen 5yrs on an IDE drive)
When a drives dies, usually the disk can still spin, so is it the electronics that is the real problem?
Maybe someone willing to risk their drives (and any warranties) who have an IDE and a SCSI drive of the same model and swap the circuit board over on each. I did this on two dead maxtor drives once (slightly different models, same drive casing) and ended up with one working hdd
Do your best, hope for the best, suspect the worst.
Just curious if anyone has worked with the disk arrays made by Dynamic Network Factory (or any similar products by other manufacturers?)
They say they use Ultra DMA drives, and connect to your machine via SCSI. Seems like a good way to put the I in RAID - assuming the product is as good as it looks.
Most important - DON'T USE RAID 5. It's not right for that application. RAID-5 assumes read-mostly, and is aimed at things like user home directories and app software; it is slower at writing large amounts of traffic than a single disk.
:-)
:-)
Take the time to understand how different RAID types ("levels") work and what is needed. RAID-1 is obvious but is space-inefficient (50% usable capacity) and doesn't solve the performance issue without adding striping (aka RAID-0) too.
RAID-3 may work well if you can get the stripe size down to a single write for the filesystem, e.g. 4+1 discs, 512 byte disc block, 2K array stripe and 2K filesystem block. Beware that many packaged arrays are software optimised for RAID-5 and / or RAID-0+1 and suck at RAID-3.
Sounds as though your price point rules out many of the midrange and high end toys that have been bandied about. Forget about EMC
There are a number of cheap SCSI to SCSI and SCSI to IDE standalone RAID boxes going round, and also PCI to SCSI or PCI to IDE cards for internal mount in server PC's. They're closer to your capacity needs (start at sub-50Gb, sub-30Mb/sec).
IDE vs SCSI for the drives is not that important up to 7,200 rpm, but will tell with 10,000 rpm units. The bandwidth from the RAID controller to the host is more important, so make it Ultra-wide or PCI.
From past experience, Sun StorageArray (or whatever they are called now) were a bit behind the technology curve; in 1996 they were still using the host OS for software RAID support, and upgrading Solaris meant hacking the array. They are all OEM anyway. Go to a storage expert instead, but one cheaper than EMC
Clariion are good for plug and forget, but may not have something down in that price range. However, performance on low-end models, even FC to FC, is not stunning. The 5700 series is (was?) overall good value, but requires FibreChannel attach.
The original poster mentioned considering a RAID level 5 array to try to speed up access. However, Level 5 can actually slow down access times . It increases throughput, but throughput is usually not the culprit. Access time is. To get faster access times, use a mirrored array (level 1), where multiple disks all carry identical information. Read access times are dramatically improved, because each disk can service just a fraction of the overall read requests!! In such an array, reads don't involve all disks, only writes do. Therefore, doubling the number of disks in a mirrored arrar theoretically doubles the number of read transactions that can by done per second. Real-world results vary, but are dramatically better than with a single disk. If the disk is getting a lot of small read transactions per second, rather than a few very large ones, then a mirrored array is the way to go, not striped!
After much shopping, questions, advice and temporary insanity, we decided to go for a new Linux box to handle the mail. Apparently, the load wasn't only coming from disk i/o wait; the kernel was using 70% cpu. We chose a Dual PIII/500 setup on an Asus P3B-DS, 512M ECC SDRAM (less than before, but prices are so high right now, and we figure processes should end sooner on this box), Intel Pro/100, Seagate Barracuda for system, six Seagate Cheetahs for spool and mail storage, and a Mylex eXtremeRAID 1100 (w/ the 233MHz i960).
It was configured with 5 spindles in RAID 5, with 1 as a hot spare, and then partitioned in half. I'm confident this badarse controller can keep up on the writes, with minimal performance hit. Preliminary results with bonnie are inconclusive, since it's working with one huge file, rather than thousands of small files. If write performance lags once it goes online (this Sunday am), we'll split it into 0+1.
Exim, QPOP, and IMAPD were hax0red to use a double-hashed directory structure. ie: "spin" would reside in /var/mail/s/.p/spin (the dot was required for those who have a single digit username). This should eliminate any overhead that ext2fs may have with large directories.
Thanks for all your advice, keep it coming. If you're a gamer, check out http://www.xmission.com/quake
-Kevin Blackham Xmission Internet Salt Lake City, UT
I see even classic Slashdot is now pretty much unusable on dial up anymore.
My understanding was that some fs's will perform some actions to avoid some fragmentation.
:)
A collegue of mine recommends doing a complete backup/reformat/restore cycle every 2 months or so on partitions that see a great deal of edit/extension to files - on a partition in use since '93 i expect this would give a radical reduction in trashing . . .
I also give you a chance to test your backup procedures
This sig left unintentionally blank.