Pros & Cons of Different RAID Solutions
sp1n continues: "We are currently considering 3 options:
(1) SCSI - EIDE controller with six 9G/7200 ATA drives (hadn't heard of this one until recently). This supposedly accesses the drives directly through DMA and bypasses all IDE, just using them as physical media. All are accessed in parallel. I'm a bit weary about the reliability of IDE drives under constant use.
(2) SCSI - SCSI controller with six 9G/7200 u2w drives. The controller currently at the top of my list is the Mylex DAC960SXi w/ 32MB cache. However, something that fits in a half-height bay, instead of hogging a full-height would be nice.
(3) SCSI - SCSI controller as above, running with 2 disk channels and 2 separate RAID 5 arrays for each mountpoint (spool/mail storage).
I'm looking for any experience with IDE/DMA raid setups (1), as well as the pros/cons of making 2 partitions, both which are very active, on one array of 6 drives (2), as well as 2 separate level 5 arrays of 3 for each mountpoint (3). In addition, any suggestions for external controllers and rackmount enclosures would be greatly appreciated. I would like the controller to have an i960 or better processor.
--
"The glass is not half full, nor half empty. The glass is just too big."
AFAIK, The only difference between a scsi hard drive and an ide hard drive is one little controller chip on the drive. So the reliability of the ide drives, mechanically, should be identical to that of the scsi drives.
Jeremy
Looking for a Python IRC bot?
I had a similar problem. I went with the sun StorEdge A1000. Its just a greate piece of hardware. I got 12 18Gb drives, 10,000 rpm segate cheetas. Its in 2 raid 5 clusters. With on hot spare. I needed to geta differential scsi adapter.. as they dont come standard on ultra2's. Wow is it fast. I can move GB in what seems like seconds. Its a night and day improvment over a jbod box. A bit pricey.. about 17K after our 50% edu discount. Its all scsi-scsi, host swap disks, host swap power supplies. When you running solaris nothing beats sun hardware.. it just works.
You might also consider just adding multiple scsi controllers and have as many drives as possible.
With each additional drive, you can access another unique piece of data simultaniously. While raid is nice and helps solve reliability and performance problems, it isn't the only solution.
It is a technique that newsgroup server admins used to use, and probably still do.
Before you go out and purchase an expensive RAID solution (of any kind), make sure this is really the problem. The vmstat command will make it quickly apparant what kind of i/o is happening, and further analysis might tell you more about what kind of hd accesses are happening.
In many cases, adding more memory or CPU can make a bigger difference than more/faster hard drives, if the problem is that the cache is too small, or paging activity too much. Also check your CPU load and make sure it is nowhere near 100% - if so, time to get a 2nd CPU.
Also, avoid software RAID implementations like the plague. They will slow down your system and provide questionable reliability. You should also try to find cards that have redundant SCSI controllers onboard, and support redundant cabling. This way if the cable, plug, or SCSI bus fails for some reason you will not be SOL.
Finally, be sure that the majority of your disk accesses are reads. RAID will slow down writes, sometimes drastically so. If the majority of your disk accesses are writes, then tuning your kernel to flush dirty buffers less often may make a good difference.
You may want to look at a Dell Powervault as a possible solution. Check out dell's website for details. They are VERY reliable and VERY fast, not to mention Dell has the best support in the industry.
Try the Encheferizer! It's a Fun Thing (tm)! Bork bork bork!
Eih bennek, eih blavek
The Network Appliance Filers are really sexy.
The beautiful thing is they use the WAFL filesystem so you can expand your array when you need to without adding big sets of drives.
Granted, I don't have one but I've submitted the proposals and am waiting on financing. The F720 scales to 464GB, is network attached, has journaling (rad), and can benefit your WHOLE network.
Of course, you have to use NFS or SMB though. I've heard they start as low as $17k but usually $30-40k with a bunch of drives but it's difficult to find general prices without hearing the sales pitch.
This paper discusses testing the Stanford Linear Accelerator Center performed while evaluating the NetApp filers. It's geared toward Usenet news but if it can handle that, it can surely handle your mail situation.
Does anyone here have first hand experience good or bad with NetApp Filers? And some word on the pricing?
Kind of off topic, but for the past couple of years I've wanted to set up a small raid setup in my "server" here at home. What are the most reasonable setups that you've seen around? What raid hardware, and what drives would you consider best to use just for low cost educational purposes.
Completely off topic, if you click on my url above, then click on the computers section you can check out the new case I made... it's pretty cool, and everyone has been pretty interested in new cool case designs =)
-S
Scott Ruttencutter
We Apprentice Developers and Designers
It doesn't sound like you need a lot of space if you're currently doing well with 9GB and 7GB. Get a pair of 18GB drives for the spools and a pair of 18GB drives for storage, and you should be set.
RAID 0+1 is a lot faster than RAID 5. It's disadvantage is that it's more expensive because you have to buy 100% more disk than storage, as opposed to 20-33% more for RAID 5.
As far as which controller to use... Sun now rebrands DPT controllers, but they're pci and you're stuck on sbus, so I don't know.
Good luck
First off, it's not clear from your post how heavily loaded the drives really are.
In particular: load is a measure of how many processes are using or waiting for a resource (such as disk I/O, CPU or network I/O). On a busy mail server that's completely adequate for the job, I'd expect to often see a high load average due to the number of processes that are waiting on the network. That is, due to the number of processes waiting for slow network connections to places halfway around the world.
All you mention is the load averages and a fairly non-specific measure of drives that are "cranking away constantly". If the drives were being used at a current constant 10% of available I/O, they'd tend to "crank constantly" even if they could be hit much harder. (still, given that losing email is considered bad by customers, a RAID 5 solution seems like a good idea anyways and leaves you room to grow and handle sudden increases in email from the holidays or spammers or gradual expansion of business)
As to IDE vs. SCSI -- never go with straight IDE on a server. SCSI has the ability to lie to the OS and silently move data from sectors that have gone bad into sectors reserved for that purpose. Sure, it slows down access to that particular block of data, but it's a lot easier than the OS having to deal with failures directly. However, I'm completely unfamiliar with the strange SCSI - EIDE setup that you're describing -- if it treats them as just physical media and provided the SCSI interface itself, it may be able to do that particular SCSI trick, as well. Physically, SCSI drives and EIDE drives are identical -- as in, you can find the *exact* same drive from certain manufacturers, only one has SCSI and the other EIDE. Reliability of the physical media is the same, IOW. In a normal configuration, *apparent* physical reliability is higher for SCSI due to wonderfully useful trickery.
I don't recall the exact model numbers, but I've seen pretty good results with Mylex RAID controllers before. (more along the lines of database stuff than what you're talking about -- somewhat different needs, but not all *that* different, I suppose.)
I can't see putting two partitions on one RAID device as making a lot of sense -- since things are striped you'd end up running into contention issues.
IOW: I'd guess that option #3 would be the fastest -- it's also probably the most expensive.
If I were you, I'd check more carefully to determine how much of the currently available disk I/O is actually being used... If the budget allows it, the dual-channel RAID solution sounds pretty good. You might want to go with two single-channel RAID cards instead -- makes it easier to stock a backup card in case a card decides to die. Try and get something with hot-swappable drives, too. It makes the RAID stuff so much more useful.
Also, I don't know the details of your setup (of course), but seriously consider breaking the mail serving task into separate pieces and run it on separate machines.
You have:
1) incoming email
2) outgoing email
3) email from customers
4) email customers pick up (POP)
It sounds like you have one machine handling all of these. Breaking these tasks onto separate boxes (If you've made the mistake of telling customers the same thing for #3 and #4 (ie, mail.isp.net instead of mail.isp.net and pop.isp.net) it might be impossible to split those two tasks away from each other)
You can have a setup such as:
outgoing1 through outgoingN all behind the single name of "outgoing" that internal machines are told to send email to that they don't know how to deal with
mail1 through mailN all behind "mail" that customers are told to have as their outgoing mail server. In particular, it should blindly send off email it doesn't know how to deal with to outgoing.
pop (harder to break into separate machines, but possible)
incoming1 through incomingN with MX records pointing at them for your domain.
Now, breaking into that many machines is probably silly. Moving outgoing to one machine and everything else to a second machine (and possibly mailing lists off to a third machine) may make a *lot* of sense though. Don't get tied into the idea of a monolithic machine to accomplish everything related to a particular task -- eventually it's much more expensive than many cheaper boxes to handle the same task.
We've just spent 2 weeks at my office researching the different solutions available to us for implementing the most reliable and scalable solution available today. Our needs differ a bit from yours as we're looking to put many machines on a network for load-distribution yet they all need to speak to the same data on a single repository. This holy grail is know as a SAN, or Storage Area Network.
/. community something to chew on in return for all that I've learned here.
Our solution is going to be a single cabinet RAID (level 5 for accessing smaller files) with a "hot spare" that will rebuild a crashed disk on the fly. This being a standard cabinet we'll have 8 disks, of which the capacity of 6 will be data (one parity (term used loosely as parity is striped on RAID-5), and one spare).
The disks are Seagate's 10,000 RPM Cheetahs, the most commonly recommended units among all the vendors we've talked to, and the controller is a multi-channel u2w with fibre interface to a Q-Logic PCI adapter.
The total system is going to run just over $15,000. This sounds like a lot, but pricing lower end systems isn't too much cheaper and you'll never get 24-hour turnaround on failed parts (if they're even available). This seems like overkill for a single system, but by adding a fibre hub later we can use the single system for many many machines once a file controller (dedicated machine) is put into place.
The beauty of SAN is that it operates much like FTP, with a control and a data connection. The control connection occurs over your existing LAN, and the data is transmitted directly over the fibre channel (max rate of 100 MB/s).
Other NAS (Network Accessible Storage) models are somewhat cheaper to implement, but performance can never match the fibre as the "control" and "data" connections (NFS or SMB) both transmit across your network.
I apologize for digressing from the straight RAID topic, but I felt obligated to give the
-Steve
- A.P.
--
"One World, one Web, one Program" - Microsoft promotional ad
"Remember when the U.S. had a drug problem, and then we declared a War On Drugs, and now you can't buy drugs anymore?"
on the IDE v SCSI be careful. with some drives the difference really is just a chip, but often drive manufacturers will use different actuators and such for SCSI drives (due to the fact that they're more likely to be dropped into a high-stress environment). The MTBF for a drive that's expecting to run grandma's recipe book is not relevant when used as a high-stress server.
I'd suggest a SCSI or Fibre Channel raid array, with some 10,000RPM drives, and lots of cache on the drives and the controller. If you are currently IO-bound, you want to make sure that you remove that bottleneck for at least a couple years. Some sort of external enclosure might be nice if only due to the fact that 10,000RPM hard drives make a LOT of heat, so it keeps things a little less critical. Oh, and of course I'd recommend using RAID-5 for obvious reasons. RAID-0 is faster, but clinically insane.
Another solution is to look at Communigate mailserver from http://www.stalker.com
It allows you to cluster your mail server to multiple servers with very little fuss.
don't listen to that crap, the scsi drives are built for industrial use. You get what you pay for. Go for the scsi setup you will be glad in the end. as for the raid five config, if you dont have the fault tolerance on (parity stripe) and are just doing it to crank every bit of speed you can out of that box well go for it; but what the other guy said about checking the amount of writes your raid is doing, sounds like a well thought out solution. i know people who have professed their love for ide but when they get a taste of scsi, they rarely go back.
I spent a fair amount of time looking at RAID 5 solutions this past summer for a client. Both external and internal, for Linux. Tried several different controller card brands and drive configurations, did a lot of reading, and bugged a lot of vendors.
You really should try to test your options and all of the configuration combinations using something like Bonnie, on a machine with a simular configuration to your target server. Make sure that your Bonnie test file size is at least twice physical RAM, to eliminate the effects of RAM and controller caching on the results.
I found that using 6 drives in a RAID 5 config was a LOT faster than 5 drives, most of the time. In fact, 3 drives in an array was faster than 5 in some cases. I think it has to do with the way the controller cards were calculating the distributed parity, and perhaps also due to things the driver was doing. 4 drives usually wasn't much better than 3, either.
Stripe sizes for the array can also make a big difference. 32k vs 128k, etc. Larger strips sizes are usually better for I/O speed, but you may find for email that having a higher number of random seek transactions per second is better than raw speed.
I did not get a chance to do any hard testing of multiple channel configurations with these cards. I suspect that splitting the I/O onto multiple channels would be a win.
IMHO, you definately want a i960 based board or system, with the fastest CPU you can find on them. I noticed a signifigant difference between boards with the 33Mhz part vs. the 66Mhz part.
FYI for others: for controllers, the AMI MegaRAID (alias Dell's PERC2/SC) just blows chunks. Older non-LVD, non-raid SCSI systems can run rings around it, at least on write speed.
It has been my experience that the write speed on a RAID 5 system is generally only a fraction of the reading speed, like 1/4th to 1/2. For a quick and stupid test, do something like 'time cat /proc/kcore > /tmp/kcore' and do the math for MB/second.
oh, and my current favorite card is the DPT Millenium V controller, using it in several systems in various places for the last 3 or 4 months. Here are some Bonnie results for a system with a DPT with 6x 7200 RPM drives, all on the same channel (internal) Linux kernel 2.2.10, dual P3 500Mhz:
our setup has right about 31,000 users constantly checking and sending email and is running RH 6.1 on a dual PII/333 with 128mb ram and 9g UW SCSI. I haven't seen a load higher than 0.75 since that machine has been the mail server... maybe something about how your mail server is setup is creating a tremendous bog on it.
Perl - $Just @when->$you ${thought} s/yn/tax/ &couldn\'t %get $worse;
I would recommend a Sun MultiPack with Solstice DiskSuite for management.
Load average is defined as the number of processes sitting on the run queue. This need not indicate a disk IO bottleneck.
;)
I would be surprised if any exim system was having more of a bottleneck to disk than it was to network. Your disks are faster than your network and exim is pretty light on un-required disk access.
The more bottleneck to network (by network I mean end-to-end with your customer not just your links) is large, the more processes are going to hang around longer.
More processes, more paging, less cacheing. Less cacheing, more IO. More paging, more IO.
Probably teching granny to suck eggs - but you do have your swap space on a seperate device don't you
The more exim processes that hang around longer, the more processes for the CPU to switch around. The more switching, the more likely you are to see paging.
If the processes hang around longer, they take up more memory which reduces the cache-size available.
Exim has several files which it accesses frequently, mainly the retry databases and its configuration. These should perminantly be in memory.
Bottom Line:
I do however suggest that you don't consider moving a single server to RAID. If you have a server that you want to move to RAID for efficency purposes... your design is wrong and you should be building a scalable system .
Red
Personally speaking for a load of this magnitude SCSI is the only solution.
...
Don't even think of software RAID.
For some background on SCSI itself try http://www.scsifaq.org
There are many types of RAID 0-5 are the "standard" but there are several new ones eg level 10 which attempts to address throughput issues. Your actual space requirements don't seem outrageous so level 5 would be reasonably cost effective.
Another thing you will probably want is hot swapping. Once you've had a box tell you a drive is dead, you've removed it and popped a new one in without taking the box down, you will not want anything else.
On the IDE vs SCSI debate, whilst IDE is fast it seems to me that under continuous load SCSI gives better throughput.
As others have pointed out - a 'designed' server, rather than a "roll your own" box would make sense. Compaq Proliants make excelent Linux machines. The SMART arrays are very good and support RAID to level 5. You can fit a lot of disks in the drive cages as well. They are a little pricey but of a good quality and reliability. We have rather a lot of them running NetWare. I get to use the older kit to run my funny Open Source stuff
A suggestion might be:
Proliant 1600, 2 x 600Mhz processors, SMART 3200 with 64Mb cache, 5 drive slots - 81 Gb available after RAID 5 on 18Gb 1" drives (that's Ultra-2 SCSI) supports upto 1Gb RAM (has 128 by default). There is also an on-board SCSI interface for CDROM etc. This comes in at about GBP 9,000
I'm not familiar with Exim, but aren't there more efficient solutions?
Although my experiences have been with much smaller configurations, qmail reportedly handles loads of this magnitude on lesser hardware.
Wow... acouple good suggestions in here, actually. I'm a Sun/HP/AIX UNIX guy by trade, so don't expect a Linux/Intel type answer. (Although, Linux @#$%'ing rocks....) Acouple ways you can go... you don't really have major performance needs, so I'd suggest going with a Sun Multipak (6 drive housing) on a dedicated u2w card, and then choose 9G/10KRPM drives if you want performance as a RAID0, or 18G/7200 or 10KRPM for density, ina RAID0+1, or RAID5.... using SDS, or preferably VxFS, if you want to ainty up the cash. Now, this might now work in this case, but just to through it out there, tonight I noticed a MAJOR shift in Sun storage Arrays. Pricing on A5200's (check it out www.sun.com), which is a Fiber link, totally redundant 22 Drive solution, has dropped on it 9G/10KRPM models, from $95k, down to $71k. This is temp pricing till they kill inventory, because they are now offering 38G/10KRPM drives in the same model, for 68K!!! That's 1/2 a TERA!!! And it scales, by chaining A5x00 boxes via Fiber, or via a fiber hub. If you can afford it, this is the way to go. Stay away from NetApp and EMC, they might be "sexy", but TCO is OOC (out of control). And, I'll have to agree on the "stay away from IDE solutions" bandwagon...the biggest mistake Sun ever made (other than using PCI) was putting IDE drives in UltraSparcs, the drives KILL the performance on the Ultra workstations. Good luck.
post your question to this list. it focuses on the use of sun disk arrays. whilst this might not be your 'final solution', you are likely to get an intelligent answer from people operating arry systems on sun hardware
send 'subscribe ssa-managers' to
majordomo@Eng.Auburn.EDU in message body
it is not likely that you need to kick out the Ultra 2, or add more buses. you probably just need more spindles. i don't know EXIM, so i would not want to comment further.
Have a look on www.sunworld.com in the article archive. there have been a recent series of decent articles on RAID systems. and make sure you are not bottlenecking elsewhere in the system.
Another authorative source is Adrian Cockroft's book on Sun Performance Tuning, and Wong's book on Sun Capacity Planning (both Sun texts).
I used to run a large mail server at a fairly big ISP who will remain nameless, and I'd like to suggest you consider a RAID-10 solution, we were experiencing disk bottleneck problems, and this really helped. Basically, RAID-10 splits the disk i/o half and half over multiple drives with the standard mirroring/striping. This is a simplified explination, but that's the basic idea.
First try iostat -D -l (numberof disks+2) 5 to get percentage utilisation in 5 second intervals.
This is my favourite tool for disk analysis. Secondly go to http://www.sun.com/sun-on-net/performance read what you feel is important but download the se toolkit.
Run zoom.se to get a professional analysis of your system. Run virtual_adrian.se to get a virtual professional to tune your box.
I recommend you do this BEFORE spending any money. I have an E3000 with 2Gb RAM and 2% processor utilisation because nobody checked the system properly.
If it is your disks I recommend sun kit even though it is expensive and RAID 5. Don't worry about people telling you about it being slower, compared to a thrashing single spindle it is extremely fast and as importantly reliable. Tinker and learn!
I work at a large European ISP. We maintain mail systems for a somewhat larger user base using Network Appliance Filers. We use the filer for stored mail and the local mailers' RAID arrays for spool. This would be an expensive solution for you to take, but it's very very scaleable, and lots of other data needs can be taken care of by the same filer. Have a volume for your mail, oh and one for logs, one for news, one for webspace... If you run out of 100BaseT, put in a quad 10/100 card. One ultra-cute feature is you can have an NT box and a unix box talking (rw) to the same data. We don't (NT - eww!!), but you could. So you can spread the (yes, rather high) cost across lots of different services. They give good performance, and are mostly very stable - we have several with uptimes over 100 days, having wrapped their NFS-ops counter some time ago... :> I would recommend them if you can spare the cash - think $50k for one that will take care of most of your data storage needs. However, if at a later time you want more space, buy a couple more shelves. Management is pretty easy too, and it is pretty damned hard to get them to actually lose anything.
if this is a server, don't go with IDE - you are a business looking for *safety* of the data as well as performance, and should be willing to fork over the extra 20 to 100 percent it takes for scsi...
as for controllers, i say mylex, high-end adapter of your choice, i would beef it up to 128 megs of ram in any case...
as for the drives, go 10,000 RPM, the difference in access times will help you out, and i think that is much more important in your case than trasfer rate... for an ISP, i would only ever buy IBM or Seagate drives, reputable workhorses that they are...
for great cases and setups, i honestly recommend macgurus.com - they specialize in mac stuff, but a scsi tower is a scsi tower, and they will build it with good components at a reasonable price to whatever specs you need... (no, i dont work for them)...
Measure first, change later
I think the problem is the contention on the single spindle for mail. Ouch!
Next RAID5 sucks for small writes. Don't even bother getting RAID unless you need the redundancy. But do get more spindles. A bunch (4-5) of 5400rpm 2Gb drives is going to perform much better than one high-speed high-capacity drive
PS. Actually, if you're going to use a lot of drives then you probably will want to RAID, because the chance of a failure is more likely with more drives. But do it mirroring (RAID 1 or RAID 0/1)
If the question is about IDE RAID5 sol'ns over SCSI (ardie-ar-ar).. then this is a felonious debate from the outset... try DEC RA3000's on for size/speed... with upcoming linux support they're should be something to consider...
Hi, /mail/a /mail/b /mail/c and all users which will begin with an a will be placed in /mail/a ... etc ... )
a couple of years ago we had the same problem till I discovered that all our mailboxes where in one mail spool directory. This was a huge bottleneck and after adapting qpopper and configuring sendmail to a split mailspool dir load came down to 1. (split mailspool is
check above first before you buy hardware
Check out http://www.penguincomputing.com/RAID.html for some usefull info on RAID
Well, after dealing with many different brands of RAID controllers, I have found that DPT's Millenium series tend to be the best. The card takes care of everything, and they're available in 64-bit flavors with 3 onboard U2 channels, or 2 Fibre channels.
Mylex are good if you're looking for a cheaper solution, or Adaptec for dirt cheap. But, if you're looking for the absolute fastest possible solution, it would be Fibre Channel Quantum Atlas 10k's on a 64-bit DPT Millenium Fibre controller in a RAID 0+1 configuration. With a 10 drive setup (equal to the total capacity of 5 of the drives) you could easily reach 100MB/s. Of course, that's gonna cost you a pretty penny.
Another non-functioning site was "uncertainty.microsoft.com."
The purpose of that site was not known.
Our mail server is currently handling about 1M messages a day. IO became a serious issue. We're still using sendmail, and I'm not going to give it up (we know it, we have a custom builds for strange applications, it works). As others have noted, load average doesn't mean much here - I have some machines with a load average at 4 that are actually idle and fine, and others at .2 that need tuning. Ignore it and concentrate on what matters.
Assuming IO matters, I am putting my full faith (and job) on Mylex controllers. I love them. I only have one in production, but am about to deploy 5 more, and we'll come in at about 600G managed by them. They just work. The DAC960SXi I have in production (for 7months now) has been flawless, delivering wire speed doing RAID 5 without any effort after initial config (which is a bit annoying, to be sure).
My production system using it is doing far too many things - mail, staging server, enterprise backup. This is changing - lack of time and historical accident made it that way. The point is that the Mylex handles it with no grief.
If you're building these, be aware that Mylex external controllers need to be mounted in a box with "internal" style connectors. For good RAID cases, check out http://www.storagepath.com/ - they are what I'm using. They look low rent, but the boxes are nice (if a bit expensive).
Down to specifics. For a mail only machine doing the sort of volume you're talking about, I'd deploy a dual processor box with three SCSI busses (one for spool, two for mbox/system access - system access is pretty cheap in comparison) attached to two harware RAID setups. Granted volume allows, I'd go RAID 5 for spool (with 18G disks, that's ~65G spool) and hot spares. For mboxes, I'd do 0+1, for as much space as needed. Stripe disks on independent controllers, mirrored to each other. Striped mirrors can grow, as you need them to (RAID 5 can't, easily). You don't want to lose anyone's mail. Hot spares for each.
Assuming 100G of mboxes, that's a total of 17 18G disks. Add three Mylex DAC9660SXis and (initially) 3 rack mount cases, and that's something around ~24K.
Availability beyond disk is a different question, that gets platform specific. I do mainly Solaris now, so I can't talk much about Linux for this. Mylex controllers can do dual active/dual host configurations, but things get more complex, and
a summary here doesn't make sense.
Other options like A1000s (Sun specific) and Netapps require different approaches - they're very different beasts. We have all of the above, and treat them very differently. We'll buy them all again - they're all decent - but are good at different things.
If you can, buy raw Mylex contollers through a reseller like TechData or similar - you'll save a lot.
Hope this helps some.
-j
I forget what 8 was for.
The first thing about a hardware raid controller is that it hides failures from the operating system. With software RAID you have to manually carry out all sorts of tasks, and I'm sure we've all heard of the engineer who mirrored the new blank disk on top of the one remaining data disk of a mirror.
Units such as SUN A1000 and Baydel connect via SCSI and you just watch for an orange light, even the part-time cleaner could pull out the correct disk and replace it and have the system back and running without the OS noticing. Storageworks and Clariion(EMC) do the same but over Fiber Channel. SCSI units tend to top out at 40Mb/s, Fiber Channel theoretically top out at 200Mb/s (they have two 100Mb/s loops) but since I only had a max of 30x18Gb disks to play with the disks were the bottleneck. Monster multi-scsi machines like EMC/IBM's can achieve whatever bandwidth you want by multiplexing SCSI connections.
We've evaluated software RAID, Hardware RAID over SCSI, Hardware RAID over Fiber channel from EMC, IBM, SUN, Compaq(storageworks) and in our opinion a good smart raid controller with two data channels and load balancing software is impossible to beat.
For Speed, stripe(0) mirrors together(1), in RAID 0+1, this allows reads at double speed because each mirrored disk can handle a request seperately, and slightly sped-up writes because you can write to the RAID controller's NV cache and carry on doing your work whilst that takes care of putting the data to media.
This of course has only a 50% data efficiency.
Using Raid 3 or 5 you lose one disk in a rank for parity, raid 6 (used by Network Appliances) use two disks for parity but have wider ranks of disks. This often means that sequential reads are fast, because a request for data wakes up all the disks in the rank, but therefore the whole rank can only handle one request at a time. Writes are slower because you have to read a stripe of data, calculate parity and write the whole stripe back again.
RAID5 is really good for data which doesn't have to be the absolute fastest.
Whilst we were doing performance tests, we measured a linear increase in speed up to 20 disks (in transactions/second), and there is a definite art in making sure that you spread the load over all the disks available so that a single disk doesn't get thrashed to death.
In conclusion? well, that depends on your OS.
For me, for a PC-based system I would choose a hardware RAID system with SCSI connection which let me choose the LUN sizes. 5 disks in a RAID5 configuration will only waste 1 disk in capacity. If you're finding your mail spool is being thrashed then I would build a 10 disk 0+1 raid and stripe the mail area across them, using the rest of the area for home areas or web areas or something else which has large storage requirements but doesn't get hit hard.
Oops, this assumes that this REALLY is your problem, a lot of disk problems go away by adding more memory to the machine... I assume you have measured this by tracking the outstanding I/O queue.
-- Don't believe everything you read, hear or think
The original posting doesn't say if the server is running pop/imap, and thus if it is used as the final delivery point for those 10,000 users.
If it is, then the hashing of the mailbox path that lucky luck mentioned is worth investigating. Also worth investigating is alternative mailbox formats. If you're using mbox format, then I'm not surprised there's a problem if you have a large number of users (and/or reasonably large mailboxes).
There has been some discussion about these issues on the exim-users mailing list. I read it via egroups.
Rather than pontificate here I'll rather direct you to some rather compreghensive documentation in the form of the Multi Disk HOWTO. It is part of the Linux Documentation Project but don't let that fool you, the HOWTO has examples of SunOS servers, practical implementations, clustering and more. It does look like what you are looking for.
There are guides, principles, a guided method and examples of several implementations. And if you need more you could try mailing the author
Did someone say RAID 0? RAID 0 isn't even a real RAID solution. You just make one big-ass partition across multiple hard disks and if any one fails, you lose everything. Personally, if money was no object, I'd go for disk duplexing... twice as many controllers, twice as many drives, easy *and* fun! The best solution, however, is probably RAID 5... unless 6, 7, 10, or 53 has something that I don't know about. Where the hell did "53" come from? Freaks.
As for me, I'm considering their lower end SCSI boxes connected to high-end Intel server running Linux, beings I have $52,000 to spend this year! (yippee). The idea is to put all the money where the valuables are (the data) and use commodity hardware and open source software to drive it. The OS would boot from internal HD and all data and local customizations (ie, /usr/local) would be on external RAID box. If a CPU box fails, unplug it from the array, plug in a spare CPU box, reboot. Minimal downtime due to hardware problems. I can then repair or replace the busted CPU box at ease.
For linux jockies, there is efforts to bring fibre-channel drivers to Linux. Be sure to look at the work at Worcester Polytech for info.
When I worked at Demon, the netapps were one of the most reliable pieces of machinery that I administered. Whilst you might think that network attached storage can be a performance problem, in practice it worked very well indeed.
:-)
You do, however, need to be aware of how to make your application play well over NFS. Exim is actually reasonable at this. Qmail is good at storing mailboxes on NFS thanks to it's Maildir technology, but the mail queue *needs* to be on a local disk... I'm not sure about postfix or sendmail (bletch).
Unfortunately, I can't remember the command to make the individual LEDs on the disks blink, which is one of the best remote diagnostic features ever.
-Dom
I accept that you will need to test to make sure that the disks are not the problem but you will need to do it the right way.
Firstly vmstat tells you very little about disk i/o. What it is good for is the processes. Look at the output from vmstat 5 for example. The first three colums are r b w, running, blocked and waiting. If there are blocked processes look at WHY processes are blocked. Use top to get the i/o wait information. If there is a lot of io wait then look at the disks. Use iostat -D to get percentage utilisation of the disks. If there is a lot of disk wait then you may need to either add more disks or spread the load.
It is interesting to note the relative speeds of devices:
If cpu takes 3 seconds to do a job then,
Level 1 cache takes 10 seconds
Level 2 cache takes 1 minute
Memory takes 10 minutes
Disk takes 7.7 months
Network takes 6.5 years
Get stuff off your disks better! Monitor your cache hit rate to get information on efficiency. Use vmstat or sar or stuff from the se toolkit. Get the se toolkit from http://www.sun.com/sun-on/net/performance. Run zoom.se to monitor your system. Run virtual_adrian.se to tune your system. Use the right tools and don't just add more memory, identify the bottleneck, fix the bottleneck, re-test and repeat until the performance is satisfactory.
A few things that may help;
1) Our POP mail server (~1000 users) running on an old SUN Solaris machine (LX) was having problems because of the number of NIS lookups that were going on. System CPU was up near 75% constantly, I/O waits near 0, and load was also very high. Solution; make mail server a NIS slave as opposed to a NIS client. Reduced load by 20% immediately. Same goes for DNS lookups.
2) Make sure you're not writing/reading to/from NFS mounted fs.
3) Install rec. Solaris patches - these can make a big difference. Try installing Virtual Adrian, and see what it reccommends.
5) Don't buy EIDE for all the reasons mentioned previously. For lots of simultaneous hits, SCSI outperforms EIDE every time.
6) Consider fibre channel disk arrays from SUN - expensive but they are nice especially the new A5200. Give 22 spindels as opposed to the 14 in the A5100.
7) Ignore the guys talking about s/w RAID solutions being a BIG slowdown. Sure h/w RAID 5 is much faster than the s/w equivalent but when it comes to RAID 0+1 then there ain't a lot of difference. Not only that BUT s/w RAID systems tend to be much easier to configure and maintain w/o a doubt - check out Veritas Volume Manager (love it!) or even the free DiskSuite (with Sun Solaris server version) is better than any h/w RAID configuration I've seen.
8) I would bet my next salary that adding a RAID system to your mail server will increase performance by less than 15%.
Oh, and I've been managing enterprise level Sun systems now for 8 years, so I'm not just a Linux geek who has read too much ;)
Hope this helps.
This is my HD:
/dev/hda1 486M 358M 102M 78% /
/dev/hda2 3.8G 2.7G 909M 75% /usr
/dev/hda3 964M 501M 413M 55% /home
/dev/hda5 99M 20k 94M 0% /tmp /usr at 100%, and /home at 100%. I have a 4.3 gig HD laying around which I had FreeBSD on for awhile (been thinking aoubt putting BeOS on it) but I may use this idea and go for it.
Filesystem Size Used Avail Use% Mounted on
and that's AFTER cleaning out... before I had / at 100%,
If you think you know what the hell is really going on you're probably full of shit.
If you think you know what the hell is really going on you're probably full of shit.
jdube is who I am.
I would not use RAID for the problem you're describing. You're most probably better off splitting the box into several others.
For example, try using a fallback mailhost for outgoing mail (fallback_mx in Sendmail). That way messages that cannot be delivered within a couple of seconds are relayed to the fallback server, keeping your outqueue clean and tidy.
For incoming mail, use a different server, or if you can, use several. You could just put them all in the MX list of your domain, with the same priority. This does wonders.
It might be smart to look at the mailbox format. Some mailbox formats (MBX) have much better performance than others. And you could put POP3 and IMAP on a third server.
All this is much preferable to simply installing a RAID array, IMO, based on the information you presented.
Used to work for Data General, the parent company. Fantastic hardware. They've just been bought by EMC though.
I would definitely try to tune the system before throwing hardware at it though. Find out exactly where the bottleneck is.
Deleted
I have heard that much of Dell's support is outsourced to one of the world's worst phone support companies, Stream. Also, while Dell's own phone support teams might be slightly better, I have heard that they have a huge turnover, and most of the phone reps that stick around are morons.
I ran into the same problem not long ago. Our local ISP needed a backup solution. The old tape drives were not doing their job anymore. But, we built our own RAID cabinet. We bought a 8 disk RAID enclosure with dual redundant power supplie from Siliconrax. The controler is a Mylex External RAID controller. The card is nice, it allows expandablity down the line. The card comes in a full height enclosure (keep it in mind, its big). We used 18.6 gig Seagate drives in the system. Each drive was mounted in a CRU Data Port removable enclosure for hot swap. RAID controller has a LCD front panel making setup a snap. The array was configured with RAID 5. RAID 5 is redundant, and provides fast read access, but write access is slower. All in all, the the array is about 100gig online. It cabinet is connected to a SGI O2. The only thing to watch is the cable length!! We've been doing nightly backups over NFS since the array was turned up. The system is nice. Go SCSI, and go the research on the proper controller. If the money is there, go fiber.
Who would believe in penguins,unless he had seen them? Conor O Brien - Across Three Oceans
having read through most of the the thread, my $0.02 is:
:-/
;-) will be more complex and your backup system might need some work too.
definitely install virtual adrian to get a better
idea of system tuning you can do and where your
real problems might lie. have you tuned all the system paramaters possible ? ncsize ? turned off
all non essential daemons/apps on the machine ?
mylex controllers seem reliable but were definitely a pain to configure - we're using them on a dec fileserver solution. one downside that appeared was they took 6-8 hours to initialize the array - compared to 1.5 hours for a non mylex controller
we're now switching from DEC+Mylex to Sun+Infortrend who make a very nice scsi-scsi controller. www.infortrend.com - we're using the 3201U2G - 4 Ultra2Wide scsi buses.
don't go to raid unless you know what you're getting yourself into - it's far more complex and expensive in the long term apart from your initial investment in the hardware. you'll have larger spares provisioning, your documentation (you do have some right
my rule of thumb at present is JBOD to 50G, RAID
as a NAS for 50G-500G and SAN (RAID/fibre) for above 500G. you really don't need raid below 50G except for specific performance reasons
it's been an interesting thread to read, since i'm
right in the middle of working on a raid5 server implementation.
-jason
First of all you might want to check out other MTAs, as well as other methods for storing the user's mails. If all mailboxes reside in the same directory, you're spending all your time in the kernel doing _linear_ searches thru the mailbox directory. You could spend millions on EMC hardware without seeing _any_ performance increase.
I'd recommend using the Postfix MTA, as it has almost all features of Sendmail, and it's secure, and (hold on) it's even faster than QMail. Eventually you could use it with the Cyrus IMAP/POP services. You definitely want to make sure that you don't have all mailboxes in the same directory. Build a hierarchial structure where you never have more than say 30-50 subdirectories/files in one directory.
Ok, if disks are still your problem, consider:
1) Software RAID is usually a lot faster than hardware RAID. And for the money you save on the HW controller you could buy faster/more disks.
2) An IDE disk is identical to an SCSI one, except of course for the interface and the warranty. The price difference is mainly due to the warranty.
3) UDMA/ATA-{33,66} IDE interfaces are as fast as any SCSI solution if you keep _one_ disk per channel. The main problems with IDE solutions is the short cable length allowed (a problem for 10+ disks) and the number of controllers you must have (one controller for each two disks)
You can spend $50K on a SCSI/HW-RAID solution easily. And you won't know if you'll even get the speed of one single UDMA drive from it (yes people actually get 15MB/s both from their single UDMA drives, and from their expensive DPT RAID solutions). At least consider a software-RAID and eventually IDE solution before rushing out to spend the next 10 years budget on the shiny HW-RAID solution.
Your setup is fairly small, eg. you would probably do just fine with a four-disk RAID-5/10 for spool and mailboxes. This is where SW RAID is worth considering. Granted, for 20+ disk systems, HW RAID may well be a better way to go, eventually combined with SW RAID.
My 0.02 Euro.
Many have posted followups here mentioning that RAID 5 may not be your best avenue. To recap, this is because of the performance overhead associated with the calculation of parity data. Unless you have a reliability issue, RAID 5 is probably something to stay away from. An exception might be hardware RAID, but such solutions are expensive and will still involve a slight performance hit.
The multi-controller solution is probably best; someone mentioned the Sun StorEDGE product with the Cheetah drives. This is a great piece of gear, and coupled with some really good storage management software (might I suggest Veritas Software's File System/Volume Manager) you'll get a very flexible solution providing the most bang for the least buck. With the Veritas product you can manage the data on the fly over several drives, and monitor & tweak the configuration on the fly while in a production capacity; additionally, the Veritas product provides a journalled filesystem which will allow rapid restarts in the event of a crash and if you have the drives, can be configured to fail over to available spares.
Yes I am a Veritas Consultant =^) but that does not change the fact that this is an excellent product that would probably go a long way towards addressing your issues (which seem more performance oriented than reliability related) on your existing drives. Check out this link for more info: http://www.veritas.com/library/su/fsconceptwp.pdf
Good Luck!
-Videoranger
Heaven offers little comfort like winamp and a big disk full of Dave Matthews MP3s
It's unbelievable how many people are confused over this. It's very simple... there is no place for IDE in a server. This is because SCSI devices are, for the lack of a better word, multi-threaded whereas IDE device operate in serial. For example, let's say your system is trying to read data and do a write at the same time. With IDE your OS has to issue one command to the controller which passes it to the device and then waits... and waits... and waits for the data (or the acknowledgement) to be returned from the device. With SCSI, the OS tells the controller all the operations it wants to do and the controller looks at it and decides if there is an optimal way of doing the commands. Then it sends all of the requests out and allows each device to complete it's task in any order. In other words SCSI operates in parallel while IDE is sequential (or serial). Major performance difference here (unless you are operating under very very light loads such as a desktop system).
I do sys admin for a software company with a mixed Unix-NT enviroment. We had some terrible experience with Samba on Unix, and NFS on NT. About a year ago, we purchaced an F720 with 100GB, for around $50,000. Now we have another F720 with a 300GB fibre-channel RAID.We talked with other NetApp customers, and they were extatic about the reliability of these machines. Although I can't say that the filer was %100 reliable, like we heard from smaller sites, we're VERY satisfied with it's performance. In the last year, we've only had 2 occasions with signifficant (> 10 minutes) downtime. As far as speed is concerned: it's usually faster than our local disks... :-)
One of the best things about it is it's simplicity. GUI people use the nice Java applet to control it (it get's better with every release of the OS), and us Unix people have a great command line interface.
If you plan to use the NetApp with lots of clients (about 500 in our case) in a mixed enviroment, the Netword Appliance is probably the most reliable and simple to maintain solution. If you want the fastest RAID array to connect to your mail server, it will simply amaze you
If your budget allows, got for it!
I think you are little off base. On a solaris box, the equation that makes up load can also include blocked processes. ie. processes waiting on I/O. Yes it can be network and/or memory, but a single 9GB disk for mail accounts is most likely the problem. The slowest part of any system is the disk subsystem. Unless you are using solid state disks.
Now from all of my research it seemed like NetApp was the way to go. So I pushed and pushed and pushed, and finally we got a F760. (Nothing like going from nothing to the top of the ladder) And now it is 2.5 months into being a NetApp user. Both the 1 and 2 month aniversaries were marked with a MB dieing. I must say it is fast, real fast, but right now the analogy is fast like a race car going towards a wall. Now ease of use, maintainence, etc on the UNIX side has been pretty carefree for me. The NetApp has been very easy to use, easy to monitor, and easy to setup. But the NT department which paid for half of it is hating life. The NetApp's quota system is straight out of unix which is not good for NT, i.e. you are putting quota's on users, groups, or qtree's (Think root level directories which are made in a special way). According to the NT guru's file ownership by individual's in NT is a bad idea, therefore all files are owned by an administrator equivalent. This means you lose user quotas. NT has a different group philosophy than unix (multiple groups can have access to a single file) so I am guessing the group quota's are out as well. Leaving qtree's, which are sort of ugly. Right now our NT people are looking at taking the loss on the NetApp and giving it to UNIX (Fine by me ;) and replacing it with a conventional NT file server. Another downside for the NT side of things is that the NetApp's is configured much like a UNIX box. It uses init and rc files etc etc. Well from NT land there is a carriage return/line feed issue. All of those files have Unix style carriage return/line feeds. I am not sure if they break if you start using dos style but I am leary to find out. Which means the Unix side is resonsible for all configuration of the NetApp. This is both good and bad. They aren't going to break my stuff, but I have to take on additional labour. Note: The hardware failures were quickly resolved by NetApp, but it still sucked hard. The NT quota issues are supposed to be resolved in the next major version of the NetAppOS codenamed Guiness or some such. The NT people IMO haven't fully explored the quota possibilities instead taking the partyline that it's too much work. And it is entirely possible that I have not uncovered all of the problem's and solution's for those problems in the time we have had it.
Why is everyone soo obsessed with RAID5. It is not the holy grail of disk storage as one or two others have tried to point out but been flamed for. Raid5 offers grat resilience, BUT is not good if performance is also required. Just because your data is striped across multiple volumes to aid recovery, it still only reads from the one volume, and the need to perform the stripping on writing makes the system slower. If performance is an issue, and money is not, then RAID1 (mirroring) is the solution (unless your system will allow both RAID0+1 (IBM RS6000's, my domain, do not)
Why is everyone soo obsessed with RAID5. It is not the holy grail of disk storage as one or two others have tried to point out but been flamed for. Raid5 offers great resilience, BUT is not good if performance is also required. Just because your data is striped across multiple volumes to aid recovery, it still only reads from the one volume, and the need to perform the stripping on writing makes the system slower. If performance is an issue, and money is not, then RAID1 (mirroring) is the solution (unless your system will allow both RAID0+1 (IBM RS6000's, my domain, do not)
Writes shouldn't take significantly longer than reads. I work with Fibre Channel, and the throughput numbers I get for raw reads and writes (no file systems) aren't significantly different. If you have a good raid controller, it should be able to keep the drives busy on both reads and writes as long as the file system is writing data in large enough blocks.
I'm trying to find out for three days what kind of memory SIMMS are in a Dell Optiplex 466/L. No conclusive results yet. If you don't beleive, give it a try. And then tell us about how good Dell support is. Buying hardware "customized" for Dell is wasted money when you get to upgrades. I have solid experience with this.
That did help delivery time, but responsiveness was still bad. The next step was to break our RAID5 volume into small mirror sets (RAID0+1). With Email you're constantly writing and reading small files, too many to cache effectively. The head contention involved in random writes and reads was killing our performance, so to minimize it we built the smallest drive arrays possible (one disk big, two disks deep for reliability). This has worked pretty well for us.
In general I'd stay away from any IDE or EIDE disks on a disk-bound server. In addition to the points others have raised about SCSI reliability, the SCSI overlapped IO model and the ability to run more channels allows you to attach and access a larger number of disks. Fibre Channel will get your transfer rates up through the roof, making the SCSI bus speed not-a-problem.
Now if Mozilla will just come out with a faster client....
>1) Software RAID is usually a lot faster than >hardware RAID. And for the money you save on the >HW controller you could buy faster/more
:-)
>disks.
Since when? I've been working on servers with and without RAID for ten years now, and this is the first time I've EVER seen this claim. Was that a typo? Hardware RAID is much faster usually, as well as more reliable. Yes, it can be harder to set up, but in the end it is well worth it. Remember, you get what you pay for. Any time you use software to do a job that hardware can handle, you are devoting CPU cycles to it. Properly designed RAID controllers offset a ton of processing that would otherwise be done by the host CPU. They don't put RISC processors on RAID controllers just for show
As for SCSI controllers, I'll echo what others here have said. Mylex is one of the best. Not the easiest to config, but by far one of the fastest and most reliable controllers out there.
I recently had to implement a new OpenView server. Our standard is HP-UX. After alot of looking around, I got a HP AutoRaid 12H (2 96MB Raid Controllers, 12 half-height bays, 3 hot swap fans, 3 hot swap powerr supplies) connected to a HP R390 (capable of dual 360 Mhz PA-RISC, max of 3 GB RAM, up to 2 9GB internal drives, Gigabit ethernet, free DVD-Rom and web remote console) retails for between $40 - $80 grand depending on how far you go with it.
I run a mail server with several hundred thousand users. What you need to do to reduce disk io is 2 things. Spool: This is a killer, but here is how you fix it. Use the re-mqueue script that comes with sendmail and configure 3 or 4 cron jobs. Copy all mail that is older than 1 hour out of your mqueue to a subdirectory like */mqueue/.1/. The second cron copies anything older than 4 hours out of .1 to .2. Third cronjob copies jobs out of .2 to .3 older than 1 day etc etc. Then set up a cron job to run sendmail in queue=process mode and use the -O QueueDir to the mq directory */.1/ through */.3/. Run the 1 directory once every 10 minutes, run the 2 once every hour, run the 3 once every few hours. Then to keep your master mqueue dir small run your normal sendmail to with -bd -q1m and process the queue once a minute. For your mail boxes themselves, especially if you are using bsd mbox's, use raid 0+1. If you can afforc a good fiber disk subsystem get one, but an A5200 with a few drives is like 60 grand!. On an ultrascsi wide bus try using 4 9gb cheeta's as raid 0+1, gives you about 18g usable and will fix your io problem. Much more traffic than that on one ultra-scsi bus will max it out and will require more controllers. Veritas will help ALOT as well. Much better than the stock solaris crap. Best,
Quite right here - The load average IS the number of processes in the run queue (runnable means NOT waiting for I/O) so 10 processes is quite a lot (unless you have 10 cpu's, like me). Processes waiting on network or other I/O don't add to the load average.
/tmp, they have just swallowed 2/3rds of your memory space, and pushed all your running processes out to swap land - kernal time spend paging means moer processes waiting on the run-Q, so pushing the load average up...
I haven't used exim much, but what are the processes doing? Are they all of one type e.g. smtp delivery, local delivery, smtp reception, pop handling? Can you streamline the process table a bit (e.g. limit simultaneous deliveries?)
What version of solaris do you have? 2.5.1 wasn't tuned for as much as 768M memory by default, so you need to raise 'minfree' in the kernal, to prevent a burst of process forks from eating all the free memory and causing a lot of demand paging (to free up pages for the impending 'cow' mappings (copy on write for new data)).
2.6+ is better with large memory systems - (virtual adrian will adjust these tuning settings for you...)
Split your swap between 2 or more (physical) disks if you are paging heavily (remember, vmstat's 'paging' figures are misleading - 'page outs' INCLUDES counts for ALL data written out to disk (i.e. file I/O) - if you are writing a lot of data to disk, you should expect/want to see a HIGH value for page outs. swap -l will give you a good idea of whether you are realy using your swap...
RAID5 (at least in software) will drastically slow your write activity and generally be a big win for read activity. If you don't need fault tolerance, just use disk striping (Raid3?) for big win on reads AND writes.
Don't put 2 file systems on the same physical set of disks unless one of them has very very low activity.
Check if exim is spooling big mailboxes into temporary files in '/tmp' - this is a ramdisk by default on solaris 2.6+, so if a user is manipulating a 500MB pop3 box via
Fibre channel disk arrays (ala sparc storage array) with a write accelerator cache, volume management software for striping and mirroring (e.g. veritas) and a journaled file system (e.g. vfs, but NOT journaled ufs) provides a really industrial strength and speed storage system (but you'll pay industrial strength prices!)
Have Fun
SAN is an ill-defined acronym that everyh vendor defines differently. The idea selling SAN is that you have a large centralized storage center that offers it's disks/volumes to all connected clients w/o the hassle of administrating a disk subsystem on each server.
The problem is that each vendor implements this differently, and has a different definition of what a SAN should be. None have really addressed the complex issues, instead implementing the kind of hack you describe - NFS with a data channel over FCAL. You still have the problems of NFS to contend with (no reliable locking, consistant transactional guarantees in client and server implementations, etc.). Heck, most vendors are selling FCAL HUBS instead of SWITCHES to accomplish this storage sharing because the switches aren't prepared to do TCP/IP over fiber!
Ideally a SAN would be a well fleshed-out spec that allows massive amounts of storage to be conveniently accessed accross a network with all of the guarantees of a local disk. That's how it's being sold. However, right now it's looking like little more then a way to get NFS to run faster.
-Peter
== Just my opinion(s)
Solaris' filesystem prior to the logging filesystem in 2.7 is a dog. I'd highly recommend that you benchmark your performance w/ Veritas' vxfs, or w/ solaris 7 before you buy a raid system.
Also, if you do get a RAID, I'd highly recommend a box that does not get controlled in software, i.e. Solstice DiskSuite or Veritas Volume manager (I love veritas' VM, but as a raid controller it lacks intellegence).
A good external box with hot-swappable drives and a sizeable write-back cache (w/ a battery!) is my favorite way to do this stuff.
== Just my opinion(s)
We have nearly a terrabyte of Clariion Raids spread amongst 4 different SGI Origin 2000 servers.
For some reason Clariion is the only RAID's SGI will slap their name on
We beat the hell out of those things. In the course of two years we have had one disk failure. The sysadmin took the disk out and simply put the same disk back in! (gave him heck for that as he had spares on-hand). OTOH it has worked fine for the last year.......
I work for a Systems Integrator-nice word for RESELLER! We are a Sun reseller first and formeost, but we are very strong in the NetApp arena. Since I am a geek trapped in the hell of being a sales(wo)man, please forgive me if I sound salesy at all....
Anyway, NetApp's are a great solution for multiprotocol storage. One of the drawbacks is that it is Network attached and therefore only as fast as your network...which has been a problem for many of our customers. Another HUGE problem is backup. There is only one product that can do it well-a product called BudTool. BudTool is a little guy that some geeks in my company thought up and brought to market, then along came NetApp who asked us to figure out a way to b/u their filers. Out of that venture NDMP was born. BudTool is the only product that makes use of NDMP correctly. That divison of my company was recently sold to Legato systems, who plans to EOL that product. NetApp is now scrambling to find another solution, since they've been recommeding BudTool from Jump Street....
Pricing is also an issue. And you were right in saying that they start at aroung $17K, but that is WITHOUT storage. A good sized storage solution, let's say 1 TB is going to run you upwards of $100K. Yikes.
There is also a good resource for people who are thinking of deploying a NetApp solution, which is the toasters users group. You can send an e mail to toasters@mathworks.com and ask to subsrcibe to the group. You'll get alot of good feed back on what works, and what doesn't. You'll also get to see the downside to using it (and BudTool). I think there is info about the group at http://teaparty.mathworks.com but i haven't been able to get there in a few.....Check it out. It's definitely worth the trip.
And if you need any quotes I'd love to help you out!!! Just Joking
"Most of my heros won't appear on no stamps..." Chuck D from Fight the Power
Whats the trade off with raid 3 vs raid 5.
Some docs I have seen say raid 3 is faster, any tradeoff in reliability?????????
We've been running a SCSI/IDE RAID system here for some months now. They're actually a pretty decent idea - the array presents itself as a Wide-SCSI device, but drives 6 IDE HDD's over 3 IDE Busses. There's 128Mb of cache in the box, too, so it feels pretty snappy (although I've got no hard figures on it's performance).
The real bonus, of course, is that it's dead cheap, compared to equivalent all-SCSI solutions.
I should probably say that we've only got it running on an NT file server at the moment, so I can't vouch for it's performance on a big scary mail server, but it's working well for us. Certainly, it seems to deal OK with everything we want from it (RAID-5, plus a hot-spare). It deals just fine with you disconnecting a drive while it's running, and simply gets on with re-building onto the hot spare (hardly a scientific measure of it's usefulness I know, but certainly handy for demonstrating to PHB's why they should like it *grin*).
Sorry if I offend but damn it I hate when people get fed misinformation... and I am grumpy_geek
Hardware is MUCH, MUCH faster than software, we've got boxes here with 2 gig of cache in the raid controller because we can't spin the disks fast enough (of course we've got terabytes of data). Hardware will allways be faster than software for the sole fact of cacheing, you may never need it, you may not do enough I/O to have to wait on disks, but just because you don't use it doesn't mean it's slower.
SCSI disks vs. IDE you really don't know what you are talking about do you. How many simultaneous I/O operations can you do on IDE???? IDE you do each operation in a serial fashion, means one I/O op holds up the rest, not a big deal for a workstation but for any multi-user situation (or better defined multiple simultaneous I/O ops) SCSI is required. Warranty? I don't even want to think how you came up with that one, or why you would think it even applies. I'll add my own one here, and this is a biggee... I don't know of any IDE HA solutions, I guess that would be because you can't share drives with IDE. Of course there is hardly any difference between SCSI & IDE.
People can get 15mb off of one drive doing large writes (writing a single 5 gig file), but you will NEVER get that performance using one drive on any type of random access information. Did you really think about what you were saying about putting in a controller PER DRIVE for the UDMA... so to get the same performance as 7 SCSI drives one a contoller, I have to add 7 additional controllers into the picture... how many open expansion slots do you have in your box today?
All MTAs that are halfway reliable are disk bound (*not* network bound) - I believe that Wietse has some information on this in the postfix data.
This is because each message is fully commited to disk as it comes in (for exim this means opening (creating) writing, closing and flushing 2 files, other MTAs differ slightly), and then a reliable local delivery costs about the same.
Hence what you need to optomise is the latency of synchronous operations. So I would strongly recommend some form of RAID with NVRAM cache which means the commit time is memory speed rather than disk seek related.
If you have the money, might I suggest Adaptec's AAA-133U2. I have worked with Mylex 960's and they are pretty solid too, but I still prefer adaptec. The AAA-133U2 has 3 channels (the AAA®-131U2 only has one channel) and just think of the speed ;)
There are some other possibilities to consider; for example, do your queues contain many deferred items (due to note unreachability or DNS problems)? If so, you are trawling through a large mail queue to process a small number of items (since some of the items are not due to be reried yet [I'm no exim expert]).
/etc/syslog.conf to make syslogd avoid calling fsync() on the mail log for every single logged message.
If you're using a lot of disk in re-reading the queue, consider using an MTA which has a separate queue for deferred items, and/or a hashed directory structure. The Postfix mailer (by Wietse Venema) fits the bill here. Postfix is particularly good for large queues.
Postfix also is deliberately written to make filesystem accesses an absolute minimum of times for each item of mail (I think you can have as few as 3 disk accesses per item). This really reduces disk loading, especially on systems with synchronous filesystems.
On the RAID side, consider alternatives to RAID5. RAID0+1, for example, is as safe as RAID5, but faster (though it uses slightly more disk drives).
What is the balance between writes and reads on your mail server?
Are you logging syslog locally on the mail server? If so, consider either moving syslog logging to a dedicated log box. If you can't do that, consider using the leading-dash feature in
Reliability Is the issue when it comes to email, and raid systems. Ofcourse Sun has the edge, so why not stick with Sun Software & hardware. The sun StorEdge A1000 has a caching controller and usually 30-40 gigs per rack, it plugs into your SCSI Bus, and you can simply add another Dual Channel Scsi card to split the load or add redudancy.
Network Appliances makes an Excellent Solution. NFS Toasters are the way to go in a distributed environment. Say you have customer on a shell account, well you can export the mail directory and mount it VIA NFS and access it from the shell servers without throwing more email load on them locally. NFS Toasters come in a great looking appliance rackmount case, and depending on how much storage you need, is how much rackspace you need.
And ofcourse there is StorageTek, which will run you a pretty penny, but offers Fibre Channel, or Multiple SCSI channel connections, full redundancy, caching, hotswap and maintenance features.
I'd never stick and IDE solution on a production box, You need something that you can get support on and Services on, so i'd suggest that you stick with the Sun StorEdge A1000 drive systems for complete compatibility and put it under the same Support contract as your UltraSparcl
AND
As far as email is concerned, you should setup an MX server to cache and forward incoming email, these work real nice since you can run RBL or pre-process out spam without killing the actuall server that holds and processes email for incoming clients. You have to look at a distributed environment, as email is precious to alot of people, and a single server machine is not gonna cut it when your upwards to 20,000 customers doing that much email.
PS. Try out Qmail too :) smaller footprint!
vmstat is mostly useful for seeing CPU bottlenecks and RAM shortages. Use the SE toolkit, or "iostat -xc 10" for disk bottlenecks. Also, go out *right now* and buy "Sun Performance and Tuning" by Adrian Cockcroft, and "Configuration and Capacity Planning for Solaris Servers" by Brian Wong. My inclination on this would be to use Veritas vxfs filesystem and software RAID 0+1, striped within and mirrored across controllers, if it's really a filesystem I/O bottleneck. Heck, if it's just a read bottleneck on a few hot files you might be able to just throw RAM at it...
Those things are NAS, and there is a whole market for them. NetApp is not the only provider (only the best marketeer) of such solution. Out of the top of my head, there are others: - Auspex Systems - EMC - Hitachi Data Systems - certainly others not so famous We are currently in the middle of choosing between Auspex, NetApps and HDS. Anyone can provide us with first hand experience?
My guess is that in this role, performance is not the paramount issue. You're not bopping the heads around like you would in a database application; and even 20MB/sec is going to be a plenty of throughput unless you have banks of ADSL lines. The important issues are reliability and maintainability.
I'm as much of a tinkerer as anybody; for my own use I don't mind spending two bucks of labor to svae one buck of investment, because I'm really investing in myself. That said, if I had 13K users depending on me for e-mail, I wouldn't mess around; two days of down time could be fatal for your business.
I'd invest $1.50-$2.00/user in a professional grade solution:
Hardware SCSI raid controller.
Drives on hot swap trays.
Same/next day on-site service contract.
External cabinet that can be swapped over to another computer.
It's been over two years since I spec'd a solution like this one (I'm doing software exclusively these days), so I can't make a specific recommendation for today's hardware. I know that some devices used to come in a separate cabinet and looked like a humungous SCSI drive; they even had their own RJ-11 to hook up to a phone line for remote diagnostics from the vendor's tech support.
If the money to swing this is impossible, then I'd recommend mirroring rather than RAID 5. All these kinds of things are compromises between reliability, cost, convenience and performance. RAID 5 is an excellent overall solution from a performance standpoint; but if you cannot afford this RAID 1 is a good choice. It offers fast reads at the cost of slow writes and survival from failure on either disk. In this application, users won't be affected by slightly slower write times. Since drives are so incredibly cheap these days, I'd say this is a pretty good choice if you are strapped for cash. You could even use IDE drives. If you could afford a second IDE controller, then you could use software mirroring across two different controllers for improved throughput.
One thing I haven't looked into is RAID-2; RAID-2 is like RAID-1 with additional error correction codes. It is seldom used in SCSI because SCSI does this for you, but it might be worth looking into for IDE raids.
Good luck.
Really what would be great is failover clustering.
Post may contain irony: discontinue use if experiencing mood swings, nausea or elevated blood pressure.
Much of this is probably repeated elsewhere, and much is common sense, but...
1. When was the last time you defragged the drives? Chances are this will reduce thrashing immediately.
2. Add more memory. More cache == less I/O. Double the RAM for a week and see how much better things are...
3. Hardware RAID is the only RAID. In most cases, the overhead of s/w RAID exceedes the I/O performance increase. Plus, the OS (whatever OS) need never know the boot drive is spread across 5 drives is three racks...
4. Hot Swap is a must for a production environment. Nothing beats the warm feeling of yanking a dead drive, slapping in a new one, and watching it get rebuilt on the fly - and the users never know...
5. Any amount of RAID will still fail badly if the PSU dies - always get redundant, hot swap power supplies.
6. The same goes for cabling.
This sig left unintentionally blank.
At least some POP servers are reputed to do stupid stuff like copying a user's whole mailbox to a new file every time a user connects up, looks at headers, or deletes a message. While I don't have specific recommendations, I'd advise auditioning a few different packages to see what kind of I/O load they place on a disk farm. Also, you may be better off spreading your load across multiple (cheap) servers rather than putting all your eggs in one expensive basket.
Also, improperly tuned RAID-5-based systems can be slower than the disks they're built out of because of the need to do read-modify-write cycles to update the parity blocks..
Does anyone know of a FreeBSD HW RAID controller. I can't find clear statements about which cards are supported. I guess I may use Vinum. Can someone help?
www.vikesfan.com's e-mail server is a home-built RAID setup. We used SVEC rack-mount cases which are really great. The controller is a DPT SmartRAID V, which uses the i960 chip and can take up to 128MB (?) of cache. We put 6 18Gb LVD 10krpm seagate drives into the chassis. We added a second channel to the RAID controller and ran the 1st 3 drives off of the 1st channel, and the 2nd 3 drives off the 2nd channel to ensure ultimate performance. Then we setup the 1st 3 drives as a RAID 0 (stripe, no parity) and the 2nd 3 drives as a RAID 0 and then setup a mirror set between the two RAID 0's. If you need speed, RAID 5 is a bad idea. Doing straight RAID 0 offers great speed, but no reliability. Doing a RAID 10 solution (striping and mirroring) offers the best read throughput because the controller can read off of either stripe set. Our setup gives us 50Gb of ultra fast LVD RAID, and we can lose up to 3 drives without incurring downtime or having to restore a backup. Vikesfan.com staff
My choice for a large mail server (30,000+ users) is CMD 5640 Dual, Hotswap RAID controllers with 256MB cache each, in an external cabinet, with many drives. The controller has 2 host channels and 2 drive channels. I also use Kingston DE300 hotswap trays, which let you put 3 1-inch drives in the space of 2 devices. I would go 6 drives and put 3 on each drive channel. Depending on how much space you need, you can use 9 or 18GB drives. One nice thing about the controller, is that it is separate from the system, so if the system crashes, you still have access to the RAID controller to troubleshoot problems. The controllers have serial console ports so you can access them the same way you would a headless sparc server. The controllers aren't cheap (~$6600), but well worth it. The Ultra2 I have with dual 400's, 2GB ram, and this setup with 50GB drives should easily handle 100,000 users. If the price is too high, you can get non-redundant controllers (CMD 54xx series) for alot less. (~$2200 with 32MB cache)
Another thing to think about is what software you are running. I was running qpopper on a server that continually had a load of 15+. after switching to cucipop, the load went to 0.15.
feel free to mail me if you have questions or want more info... I can hook you up with the vendor I order from.
-Randy
(1) SCSI - EIDE - _BAD_IDEA_. I'm not quite sure if you're familiar with SCSI and IDE's physical performance attrobutes, but if you are experiencing any bottleneck issues whatsoever with SCSI, IDE, even EIDE is possibly the worst thing to do in this situation.
(2) You ought to make the point that you're looking for Sun-box stuff, which is *way* more confined than PC RAID. We are running Exchange (no flames) and we use mirrored RAID 5 on two seperae controllers- we can have any two drives fail simultaneously with no repercussions. My point is that the PC RAID market seems to have far more choices for you.
3. RAID on IDE works fine, even great for a desktop user who pushes the perfomance envelope- but (E)IDE cannot and will likely never compare to UW-SCSI or U2W-SCSI.
NP
Not raid at all but just an online backup:
dd if=/dev/sda of=/dev/sdb bs=1048576 >> $LOGFILE
This assumes identical geometries. So buy 2 drives instead of 1. Use it once a week or every night. This has saved my ass countless times. Every box I build gets a dupdrive script containing the dd command above and a spare drive.
Adults are obsolete children. - Dr. Seuss
As lots of people have said, the disks and raid setup can be a problem. Spend some time with vmstat and iostat and determine where the bottleneck is. If you have a throughput problem, you want more controllers in the mix. If your spindle bound, you want more disk. However, I didn't see mention of what type of filesystem your using. I imagine with a mail server that you have thouosands of tiny little files spread across only a few directories. For that situation, it's rather critical to put a filesystem that does binary lookups of your metadata (Such as Veritas). use vsar to look up your inode hits and misses and if the ratio is out of whack, try to break things down to fewer files.
Secondly it seem you are somewhat behind the curve here, the latest Linux SW RAID is capable of autodetection and I'd expect commercial alternatives to offer at least that. Today, SW RAID does handle the problem you mention and can handle people moving disks and changing SCSI ID numbers.
And tell me: what is simpler, one sysadmin fixing it all or one SW sysadmin and another HW sys admin calling each other? Partitioning off for some weird principle rather than efficiency sounds like the works of pointed haired bosses.
We run RAID on over 80 developemnt servers and 20 production servers. We run NT and MS SQL 7.0, but also do things like bill generation which involves alot of raw file access. Currenty our best setup runs like this (We use HP Intel hardware):
We have 2 raid controllers (each has 3 channels, but you won't need that much for your setup) running a RAID 10 arrray on each. RAID 10 is about the best performace you can get out of RAID. Basicly the idea here is that most RAID controlers are A) Slow B) Can only handle so many I/0's per second, and thats always slower then a modern system can handle.
If you don't know what RAID 10 is, basicly you have 2 or more mirrored drive sets. Then you stripe across those drive sets. This means you A) Need atleast 4 drives B) You lose 1/2 of your useable drive space in the mirroring. But this also means you can do 2 seperate reads across 2 different sets of striped disks which is very speedy (In theory, anyway)
So if you spread your spool and mail across 2 raid controlers running RAID 10, thats probably the best performance your going to get. 10,000 RPM drives will help alot too. The only problem, this is also the most expensive way to do it.... ohh well....
-Tripp
Cost saving tip - the A1000 are Metastor OEM. I buy a 10 drive unit that is very similar to the A1000 direct from Metastor with a newer version of RAID Manager), a 5 year warranty, and signficantly cheaper (can't remember difference off hand)
I'm suprised that you got a sales rep to get back to you considering your an edu sight. We also where looking to invest a large chunk of our budget into Sun disk storage but couldn't get a Sun sales person to return any of our calls. We even went as far as to tell them that we needed to fill out a purchase order by the end of next week so they ended up claiming it would be no problem to get back to us by the end of the day. By the end of the week they declaired we should just give them a couple more weeks to get back to us. I guess that with the edu discount that Sun sales just figure getting back isn't worth their time. The joke around the dept. now is "Sun put the dot in dot com but we are ee-de-you so will have to go with IBM instead."
you mentioned ide raid but not by name. i've been looking at raidzone's solution. haven't bought yet, but it does all the hot swapping stuff you want *and* is riding the ide cost curve, which is now at 20G/$200.
the interface is neither ide nor scsi, but rather a board in your pci bus.
oh, right, you have an ultra sparc. *LOSE THE ULTRA SPARC*! they are not fast. you're better off running linux or freebsd on an x86 farm or beowulf cluster.
Run a benchmark like postmark on your current system, and compare it with the result on a system like the HP netserver LPr with the 10k rpm drives running linux (say 2.2.10-ac12, or 2.3.x). I think you will be amazed at the difference. You don't need raid, you need a faster OS & filesystem. http://www.netapp.com/tech_library/3022.html
This may all be irrelevant, however. The workload you describe is mostly mail, right? In that kind of workload, you're mostly talking about lots of very small transfers. The advantage you will get from striping is minimal to zero in that case. Furthermore, your filesystem is probably seeing a very high metadata-to-data update ratio. You don't mention what filesystem you're using. Many filesystems will force synchronous metadata updates in order to ensure consistency at critical points. This translates to a large number of mandatory seek and read-modify-write cycles. In a busy workload, the head is often moved to service another request during the modify portion of that cycle, forcing another seek. In modern disks, seeks are performance-killers.
If you're looking for managability and performance improvements, you may wish to investigate filesystem and soft storage system alternatives, such as Veritas' filesystem product.
You should also consider putting such a filesystem solution on top of any hardware solution you might purchase.
Then either do a Raid 0+1 which is best for redunancy but burns alot of drives
or
Get a NetApps box or a EMC Calerra Filer. These things scream and are the best solution if you want to connect another server with the same or different plateform.
You can go Raid 5, which is good but don't go software Raid or anything with IDE. Go SCSI or FiberChannel.
Good luck.
You'll probably want to investigate whether or not it's your disk I/O that's actually causing your problem. If it is, (and I know I'm going to look like the antichrist of /. because I reccomend this) you may want to look into the Sun Storage Solutions since you made the right decision to get a Sun in the first place. http://www.sun.com/storage/disk.html The MultiPack (http://www.sun.com/storage/multipack/) works very well. The disk I/O speed is plenty for a fairly heavily used Oracle server we have.
Besides from being obviously more expensive, RAID-1/0 is not always faster than RAID-5. When a request from the host writes more than half a stripe worth of data, RAID-1/0 needs to perform more disk accesses (because every write has to go to both disks in each mirrored pair, while RAID-5 has a single parity unit per stripe). The math is trivial. If you don't believe this, consider the sample case of writing 4 consecutive stripe units on a 5-disk RAID-5: RAID-5 writes the 4 units and the parity computed in-core for them (ie 5 disk writes), while RAID-1/0 would need 8 disk writes (writes to 4 mirrored pairs) So, as usual, the answer depends on the workload.
I have a 5 year old 2gig Baracouda (sp) in my normal workstaton, also a 9 gig drive, I also have room for 8 more devices on that SCSI bus. With SCSI you don't pitch the old drives, you just add more.
I have an even older drive in my firewall at home. It is an old CDC 300 Meg. Plenty of disk for a linux masquerading firewall with a connection. The firewall PC is an old 386-20 with 16 Meg of ram. Also plenty for the task.
I don't understand why people think that old hardware is completely useless. I like it though, I got both the drive and pc for my firewall were free because people thought they were too old to be useful.
If you have the money to spend, I recommend talking with Network Appliance. (www.netapp.com) They have some VERY nice storage hardware, and it is everything you wanted. Fast, scalable (up to 1.4 terabytes currently). We have a small 7, 9gig drive solution currently, and it's a dream come true. Fast, reliable, you name it.
There is really not so much that differentiates ATA from SCSI anymore.
;-)
I wouldn't go that far.
Yes, IDE has finally caught on to such things as DMA and busmastering, and throughput on IDE devices is in the same arena as SCSI now. But.
IDE is limited to two devices per bus, and generally requires one IRQ per bus. IDE also has very strict and short cable length limits, and lack a "external" connector -- you generally can't have an external IDE device (I know is is possible, but the cable restrictions make it very difficult).
There are more kinds of devices (scanners, printers, etc.) available for SCSI then IDE. SCSI is generally more capable in terms of what you can do with it.
IDE controllers tend to be very primitive compared to their SCSI counterparts. Things like bus disconnect, command queuing, scatter-gather, even busmastering are often not available or iffy on IDE controllers. This applies especially to the onboard controllers in many motherboards; the number of shortcuts taken there are incredible.
Likewise, the drive electronics and HDA components in IDE drives are often cheaper then those in SCSI drives. These are all design and engineering issues, not issues with the specification itself, but they exist. The problems stem from the fact that IDE is marketed to be cheap, cheap, cheap, and thus gets are higher incidence of cheap components. It isn't limited to IDE, either -- you can also find cheap SCSI hardware, it is just that there is less of it.
IDE often appears faster in benchmarks, because benchmarks typically try to do operations in bulk on a single device. IDE has a lower command overhead then SCSI, so for such things, IDE will be faster. But when you get into the real world, and have multiple processes trying to access multiple devices at once, that is when IDE stalls, while SCSI keeps on going.
I realize this started off as a discussion about RAID, and that IDE RAID devices are not your typical RAID devices. They usually have one drive per bus, connected to a custom controller that multiplexes them all and presents them to the host as a SCSI interface. But the topic has drifted to more general applications.
Just my 1/4 of a byte.
dragonhawk@iname.microsoft.com
I do not like Microsoft. Remove them from my email address.
While I respect SCSI, I love SCSI, and I admire the DPT SmartArray 4 and 5 cards, I was wondering if anyone has experience with the Promise Fastrack/66 RAID controller? Its a low cost IDE raid controller that supposidly handles the mirroring in hardware and contains its own BIOS. My question is, will it work with FreeBSD and OpenBSD? I think it would be perfect for lamer systems that are just being used for backups or logs... dual 20 gig IDE's fully mirrored. However, I can't get a solid answer from anyone on it. Will it work outside of windows? If it looks like a single drive to the system, I don't see why it wouldn't. But no one can say solid, YES it worked or NO it didn't. So here is my chance to ask here.
I'm a Sys admin. for a mid-size company. Read the bit on the toasters archive about how Yahoo got burnt with the NetApp toasters. I've been visited by a couple of NetApp slimeballs. NetApps are Pricey and you don't really get what you pay for. I'd rather go with Server attached storage and have a lot more choices and flexibility than tie in with a single vendor who can screw you in the most critical moment.
This is a quote from an Eric Allman interview on sendmail.net:
Are there features in sendmail that people should be aware of but aren't?
Oh, there are probably dozens of them. One that comes to mind, a very simple one, is the fallback MX option, which lets you redirect mail that has failed the first time to another location. It essentially acts as a lowest possible priority MX record for all hosts. For example, if you've got a mail system that's got a lot of traffic going through it, you have another machine that you dedicate to the slow mail, the stuff that didn't go through the first time, where presumably you're less concerned about how quickly it goes because the other end's being slow. So you set your initial connection timeout to something low - five seconds, ten seconds, whatever's right for your site - and you set the fallback MX on your main site to this fallback host. That way the mail that's going to go through quickly just goes fsssssssst right through your main server, while the stuff that's going to be slow (because the other end is either slow to connect or down) goes off to this other machine and doesn't clog up the main machine. It turns out to be just an amazing win. And these days the price of a PC box running FreeBSD or Linux is close enough to zero that it might as well be zero, so it's not really a problem to do it.
Sure as hell beats backing up an NFS share off of a different machine (gack).
We're planning on deploying 2 database servers accessing data off of one external disk array. The second would be a failover server, so they shouldn't be concurrently accessing the same data/partition, but could. I know multiple boxes can access a single disk array through one scsi bus, but everybody always talks about them using different partitions. Can you have 2 boxes access one partition on SCSI? Fiber?
What form of RAID would be best? 5? 0+1? I almost wish I could to a 5+1 - Stripe with parity mirrored. I know that's alittle paranoid, but so am I... :-)
We're looking into the Gateway and Dell disk arrays. Has anyone heard good/bad about these? They have max of 8 disks, what would be the bets configuration?
Thanks,
Jeremy
Jeremy
"Opinions are like assholes; everyone's got one..."
This works pretty well for heavy load. Adding more users? Add another machine and stir...
We're not the only people that do it this way - I believe there was a paper submitted at one of the LISA conferences that describes just such a setup.
at CSC.. I'm not an employee, just someone who bought a 4-tape DAT autoloader for $269 from them and is quite happy with it..
Your Working Boy,
I can give you some input on #1. The main (and only) advantage to using IDE over SCSI is price. I have a 70GB (4x17.2GB Maxtor) UDMA Raid0 running on a server at home. It cost me only about $700 to build it. It is running on a Promise Ultra66 controller. I have run raid on it under both Linux and WinNT and it works great. Disk performance is actually very impressive, much faster than a normal IDE drive, but that is to be expected when you stripe 4 UDMA drives. It is still nowhere even close to the speed of a good SCSI Raid setup.
I'm always amused at the large number of people who immediately think that because you are placing IDE/UDMA drives in a Raid configuration it will cause the drives to die quickly. That's bullshit. Granted, the SCSI drives will last you a hell of a lot longer, but IDE won't keel over and die just because it is Raided and under a high performance load. Most IDE drives will last at least for the length of their warranty period. Make sure you get the 'SMART' enabled drives and some monitoring software to give you a heads up if the drives begin to exhibit signs of failure.
If you want reliability and speed and are willing to pay for it, use SCSI. If you want large amounts of space and average speed at a decent price use UDMA. My needs run to cheap space and lots of it, and so far the UDMA solution has worked well for me.
I wouldn't recommend anything but a SCSI solution to you for any situation where you are looking for high performance fault tolerant systems. In your case I would go with option number two in your post above, and option three only if you are really, really worried about losing your data.
PS - This is running as a software Raid0, there is no hardware present. I have seen a number of benchmarks (some from Ars-Technica, don't have the link) that claim the performance of the Promise raid controllers is exactly the same as a software raid. I'm not sure if their competitors have this problem, or even if they have any competitors in this area.
Hell is being intelligent in a world full of idiots.
Since this question is dealing with performance, I'd like to get pointers to fast backup solutions that don't require down time. We have a couple hundred gig and backups are taking hours. (some of this is from raw (informix) partitions).
We are looking at an app that basically does a dd to tape but requires the system to be down.
thx
For precious data, I agree that Raid-0 is insane.
but for temporary data that doesn't need to be 100% sure to be kept, that can be a good solution.
For a news server spool, if it crash replace the disk and the standard NNTP messages will refill the news spool shortly (depending of the connection speed)...
Of course OS disk and what users send and other data that is more important should not be on Raid-0...
The reason for this is that RAID-1 uses 1:1 mirroring of a 2-drive set while RAID-5 uses rotating parity in which parity information is distributed across all drives.
With regard to space, using RAID-1, your usable yield (what shows up in df) is half of the total disk space put into it. With RAID-5, parity info is spread througout all the drives. Eg., I have a RAID-5 using four 4GB drives, which gives me 12GB of usable space. With 0+1 on this configuration, it would be 8GB usable.
As for speed, both RAID-1 and RAID-5 allow you to read from multiple disks at once (which, of course, is a win). For writes, a drive pair in a RAID-1 will take as long as a write to a single drive. On RAID-5, however, it takes longer because (afaik) the RAID controller has to determine which drives to write the parity info to, which takes CPU time.
A decent little overview is at DPT's site (sadly, only in PDF) at http://www.dpt.com/pdf/understand_raid.pdf
You might want to look at some of the clustering solutions. Fibre channel hard drivers with intelligent raid controllers would be a good solution. Unisys makes a fairly nice box that has 10 drivers with two controllers (does fall over), we've got 10 18gig drives on it with no problems (or lack of speed).
We've deployed several here. We use them on mainframe boxes as well as Unix. RAID-5 we have 80G to 180G units in place right today over our network. The mainframes hit these guys HARD. I highly recommend 256M cache and a RISC based controller. Good performance for the buck! We use http://www.excelcdrom.com primarily as our supplier
raid configurations tend to imply huge virtual drives. huge drives need a loooong time for a filesystem check (once i had a >3h one with a 72GB drive/raid 5). therefore i would highly recommend a log structured filesystem!!!!!!
...
the gdt controller (http://www.icp-vortex.com/) works fine with linux (and of course any other operating system, linux tools for i386/alpha available).
about raid modes: security: mirror, one drive security: raid 5, speed: striping - these are the common uses, but the choice, depends on your needs
CU
SCSI drives have a 5 year warrantee
:)
IDE drives are 3 year (at least, I haven't seen 5yrs on an IDE drive)
When a drives dies, usually the disk can still spin, so is it the electronics that is the real problem?
Maybe someone willing to risk their drives (and any warranties) who have an IDE and a SCSI drive of the same model and swap the circuit board over on each. I did this on two dead maxtor drives once (slightly different models, same drive casing) and ended up with one working hdd
Do your best, hope for the best, suspect the worst.
Just curious if anyone has worked with the disk arrays made by Dynamic Network Factory (or any similar products by other manufacturers?)
They say they use Ultra DMA drives, and connect to your machine via SCSI. Seems like a good way to put the I in RAID - assuming the product is as good as it looks.
Most important - DON'T USE RAID 5. It's not right for that application. RAID-5 assumes read-mostly, and is aimed at things like user home directories and app software; it is slower at writing large amounts of traffic than a single disk.
:-)
:-)
Take the time to understand how different RAID types ("levels") work and what is needed. RAID-1 is obvious but is space-inefficient (50% usable capacity) and doesn't solve the performance issue without adding striping (aka RAID-0) too.
RAID-3 may work well if you can get the stripe size down to a single write for the filesystem, e.g. 4+1 discs, 512 byte disc block, 2K array stripe and 2K filesystem block. Beware that many packaged arrays are software optimised for RAID-5 and / or RAID-0+1 and suck at RAID-3.
Sounds as though your price point rules out many of the midrange and high end toys that have been bandied about. Forget about EMC
There are a number of cheap SCSI to SCSI and SCSI to IDE standalone RAID boxes going round, and also PCI to SCSI or PCI to IDE cards for internal mount in server PC's. They're closer to your capacity needs (start at sub-50Gb, sub-30Mb/sec).
IDE vs SCSI for the drives is not that important up to 7,200 rpm, but will tell with 10,000 rpm units. The bandwidth from the RAID controller to the host is more important, so make it Ultra-wide or PCI.
From past experience, Sun StorageArray (or whatever they are called now) were a bit behind the technology curve; in 1996 they were still using the host OS for software RAID support, and upgrading Solaris meant hacking the array. They are all OEM anyway. Go to a storage expert instead, but one cheaper than EMC
Clariion are good for plug and forget, but may not have something down in that price range. However, performance on low-end models, even FC to FC, is not stunning. The 5700 series is (was?) overall good value, but requires FibreChannel attach.
The HP AutoRaid is a sweet, fast drive. Caches all data in ram drive. When it has has time, It writes it in raid 0+1. It is stripped automatically all over the drives. After the drives are half full, it converts the Older data to raid 5. It always keeps a small part of the drives at raid 0+1, even when it's 100% full, for speed.
You can think of ram as cache for the Raid 0+1 parts of the disks. Then, Raid 0+1 is cache for the Raid 5 parts of the disks.
It doesn't need a lot of tweeking. If it gets slower because it's too full, just add another drive. It converts the newer or most used data back to Raid 0+1. This can be done in the middle of the day, without downtime.
If you fill it up with smaller disks, and you need more disk space. Just pull out a 4 gig disk, stick in a 18 gig. Let it rebuild. Pull out another 4 , replace it with another 18. Again, middle of the day and no downtime.
I love this drive, man. It's fast, and you don't have to manage it. I didn't say it was cheap, I just said it is worth it.
The original poster mentioned considering a RAID level 5 array to try to speed up access. However, Level 5 can actually slow down access times . It increases throughput, but throughput is usually not the culprit. Access time is. To get faster access times, use a mirrored array (level 1), where multiple disks all carry identical information. Read access times are dramatically improved, because each disk can service just a fraction of the overall read requests!! In such an array, reads don't involve all disks, only writes do. Therefore, doubling the number of disks in a mirrored arrar theoretically doubles the number of read transactions that can by done per second. Real-world results vary, but are dramatically better than with a single disk. If the disk is getting a lot of small read transactions per second, rather than a few very large ones, then a mirrored array is the way to go, not striped!
When I worked at an ISP a while back (+-30k users), we did the following:
A whole bunch of POP servers each storing different accounts (i.e 3 machines each with 0.333 of the users on them). A proxy then redirected the user to the correct machine when he checked his e-mail. The machines were P133's (this was long ago), with mirrored hd's. For incoming mail, use multiple MX entries with the same priority.
This offered great scalability, but if a POP box fell over, 1/3 of the users would have lost their mail. Nowdays, I would therefore modify the setup to something like this:
A whole whack (about 4) of servers sharing the data between them using CODA (or something similar). Round robin DNS set all of them as POP servers, sharing the load nicely. Also multiple MX entries made them all incoming mail servers.
This setup is very very scalable, and reliable too. In fact, you could have a hoard of CODA servers sharing the data in the background, and then set up your POP servers as clients to distribute the load even more.
Our mail server's spool bottleneck turned out to be a result of the fact that clients reading mail was a write intensive task. The clients were NFSv2 linux machines running mostly pine. Solution was memory between the system and the disk (same as a RAID's cache). Seemed to solve the problem and made the drive much quieter too. This was for a small group of about 700 accounts.
Useful commands for studying this include iostat and nfsstat.
I've also been looking at a SCSI/IDE RAID. One I'm considering is called RAID FlyerII.
EOF
Interesting. My case is that exact same brand, xcept it's a mid tower.. I got a weird square-U-shaped bracket thing with it, looks like it would be perfect for holding a hard drive. But I can't for the life of me figure out where it mounts! =) It's not above the PS, I've got that one in place. Also, on mine, the ATX format punch-sheet that goes in the back of the case was cut incorrectly! They punched all the port outlines, then rotated the thing 180 degrees and punched all the tabs and holes in the outside so it fits in position... Had to take about 3 hours and a hacksaw to fit it in place. grr. Lime
Just use software RAID. I've got two 9G SCSI drives hooked up using RAID0 which gives me a nice 18G parition. Ok, no hot swap but it's free. I'm using RedHat 6.1, which to my surprise and delight, came with software RAID support already compiled in the kernel. It was embarasingly easy to get going.
After much shopping, questions, advice and temporary insanity, we decided to go for a new Linux box to handle the mail. Apparently, the load wasn't only coming from disk i/o wait; the kernel was using 70% cpu. We chose a Dual PIII/500 setup on an Asus P3B-DS, 512M ECC SDRAM (less than before, but prices are so high right now, and we figure processes should end sooner on this box), Intel Pro/100, Seagate Barracuda for system, six Seagate Cheetahs for spool and mail storage, and a Mylex eXtremeRAID 1100 (w/ the 233MHz i960).
It was configured with 5 spindles in RAID 5, with 1 as a hot spare, and then partitioned in half. I'm confident this badarse controller can keep up on the writes, with minimal performance hit. Preliminary results with bonnie are inconclusive, since it's working with one huge file, rather than thousands of small files. If write performance lags once it goes online (this Sunday am), we'll split it into 0+1.
Exim, QPOP, and IMAPD were hax0red to use a double-hashed directory structure. ie: "spin" would reside in /var/mail/s/.p/spin (the dot was required for those who have a single digit username). This should eliminate any overhead that ext2fs may have with large directories.
Thanks for all your advice, keep it coming. If you're a gamer, check out http://www.xmission.com/quake
-Kevin Blackham Xmission Internet Salt Lake City, UT
extensively, connecting to both FC Fabrics and
hubs. There are several drivers available for
SCSI FC for the QLA2200 which sells for
about $400.
Erling Nygaard in my group has written a Fibre
Channel HOWTO which can be found at:
http://www.globalfilesystem.org/howtos/fibrecha
Matthew O'Keefe
I see even classic Slashdot is now pretty much unusable on dial up anymore.
qmail performs here very well. Statistics: Average successful deliveries per day: ~900,000 Average successful deliveries per second: ~10 Average total delivery attempts per day: ~1,100,000 facts: CPU: 2xPIII w/ 550MHz RAM: 512MB Disks: Two Quantum disks (9GB and 18GB), Fast-20 bus NIC: Intel EtherExpress Pro100 O/S: Linux 2.2.13 running qmail-1.03 with some patches, up to 800 parallel remote deliveries System load is 3 at peak times. I don't think exim would improve that noticeably. The point? qmail scales very good; if you need more speed, throw hardware at the problem. In our system, the only bottleneck is the network.
facts:
System load is 3 at peak times. I don't think exim would improve that noticeably. The point? qmail scales very good; if you need more speed, throw hardware at the problem. In our system, the only bottleneck is the network.
First, qpopper is slow as hell, because it copies the user' mbox before working on it. Using cucipop in a 14,000 user environment decreased the load from 25 to 6. See cucipop's freshmeat entry.
Second, you should choose a better hash function. Usernames are hardly good distributed, so you will have to deal with many collisions. I'd go with MD5 here, because it is quite fast and is distributed very well.
Some solutions with source can be seen in the Erez Zadok's Research web pages.
Nevertheless it does look like RAID 10 has a high fun factor
Some preliminary work and proposals are under way for a special RAID 10 that allows this but nothing is ready for testing yet. I would however expect commercial solutions to have this and more, after all they do charge a pretty penny for it, yes?
My understanding was that some fs's will perform some actions to avoid some fragmentation.
:)
A collegue of mine recommends doing a complete backup/reformat/restore cycle every 2 months or so on partitions that see a great deal of edit/extension to files - on a partition in use since '93 i expect this would give a radical reduction in trashing . . .
I also give you a chance to test your backup procedures
This sig left unintentionally blank.
What is more common, however, is the use of ASIC and FPGA and that has a greater potential for improving the speed than general purpose processors of yesteryear.
I expected HW RAID to cut down on main bus traffic, especially PCI bus traffic on the motherboard but now it seems those gains are rather small.