Pros & Cons of Different RAID Solutions

just a small note about scsi vs. ide by jemfinch · 1999-11-18 14:19 · Score: 1

AFAIK, The only difference between a scsi hard drive and an ide hard drive is one little controller chip on the drive. So the reliability of the ide drives, mechanically, should be identical to that of the scsi drives.

Jeremy

--
Looking for a Python IRC bot?

Re:just a small note about scsi vs. ide by dieman · 1999-11-18 14:34 · Score: 1

HAH!

I'm going to laugh at you! Scsi drives have more reportability for failures and have some "dead space" set aside for failure recovery in hardware. Also, the difference between the scsi and ide models are huge. In ide the cpu does more, in scsi the hardware does more.
Oh well...

--
-- dieman - Scott Dier
Re:just a small note about scsi vs. ide by bug1 · 1999-11-18 14:42 · Score: 1

Mechical reliability shouldnt be bay different between a scsi drive and an ide drive if they operate at the same speed (RPM's)
Re:just a small note about scsi vs. ide by Falsch+Freiheit · 1999-11-18 14:58 · Score: 1

True.

However, SCSI drives reserve dead space and move the contents of bad sectors to a reserved sector and remap the bad sector to point at the previously reserved sector.

IOW: SCSI drives hide physical defects on the media from you, where IDE drives require the OS to deal with the problem.
Re:just a small note about scsi vs. ide by Ya+Mother · 1999-11-18 15:13 · Score: 1

A big diff is that scsi drives have separate heads for reading and writing and thus can do both at once while ide drives has to do either or, effectivly making a very busy scsi-drive twice as fast as a very busy ide-drive.
Re:just a small note about scsi vs. ide by Vidar+Hokstad · 1999-11-18 15:13 · Score: 1

Reread the suggestion. It's about using an IDE RAID controller that bypasses most of the IDE stuff, but that works with IDE drives.
Re:just a small note about scsi vs. ide by Anonymous Coward · 1999-11-18 16:31 · Score: 0

my SCSI drives have a 5 year warentee. Find an IDE drive vendor that will do that.
Re:just a small note about scsi vs. ide by Holger · 1999-11-18 16:36 · Score: 2

There is really not so much that differentiates ATA from SCSI anymore. ATA (formerly known as IDE) drives have been remapping bad blocks transparently for years, they have been doing DMA for nearly as long, and some drives even came in ATA and SCSI versions (IBM DCAA/DCAS for one), where only the interface board was different and absolutely everything else was equal.

There is even a usable external ATA RAID subsystem out there, manufactured by Arena. They use the same i960 that is used on high end SCSI RAID controllers and deliver decent performance with cheap drives. (Remember: The I in RAID once meant inexpensive)

Of course, in a server, you want reliable drives. But that has next to nothing to do with the interface. UDMA is very reliable as far as the interface data transfer is concerned, I would rate it even higher than SCSI in this regard (proper CRC vs. ordinary parity). The quality of the disk mechanism is another thing, but with IDE drives being so cheap, you could afford to upgrade the things so quickly that they never get a chance to fail at work. Or you could just buy two big ATA drives for less than one SCSI drive and do RAID1.

For the records: Recent ATA drives really scream. Look at these bonnie results from my workstation (dual P2, 128M, Red Hat 6.1, 2.2.13, Test run on 2 GB / Partition 50% full):

-------Sequential Output-------- ---Sequential Input-- --Random--
-Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
Machine MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU
512 18196 97.7 23648 22.5 10807 19.1 19702 84.2 23128 6.9 129.9 2.0

The drive is a 20 GB Seagate ST320430A which sells for less than 400 DM around here. Remember: These are not artificial results on an empty filesystem. This is my real root partition which is used daily.
Re:just a small note about scsi vs. ide by amorsen · 1999-11-18 16:51 · Score: 1

Both SCSI and IDE drives these days hide defects. There is no technical reason why IDE drives are less reliable than SCSI drives. However, SCSI has lost the workstation market completely, so now only servers use SCSI. The data on server disks is generally more expensive to replace, and downtime there is more troublesome, so people are prepared to pay for quality. Some manufacturers have made two versions of mechanically identical drives, one IDE and one SCSI. Quantum Fireball is an example. That practice seems to be ending now that there is no such thing as a low end SCSI drive.

Anyway, there sure is not two heads on each platter as suggested by another poster. At one time Seagate made Barracuda drives that were able to read data off two platters in parallel. They dropped it in the later Barracudas when the increase in data density made it possible to make faster drives without this feature.

Another issue is that IDE drives are usually optimized to withstand getting started and stopped again and again by powersaving, whereas SCSI drives are optimized to run continuously for years.

Benny

--
Finally! A year of moderation! Ready for 2019?
Re:just a small note about scsi vs. ide by NecroMancer · 1999-11-18 17:04 · Score: 1

Actually, with SCSI, you can have inter-device transfers (without intervention of the CPU or the DMA controller) and can access several devices on the same SCSI bus at the same time, which you cannot do with EIDE (you have to end the dialog between the driver and the device before accessing another EIDE device). I don't know if I made myself clear, but in any case there are many webpages out there that explain the differences.
Re:just a small note about scsi vs. ide by miku · 1999-11-18 17:12 · Score: 1

If planning to build raid5 arrays the
physical limits of IDE might become a
issue.

I don't know if it's reasonable to plug
raid5 array disks as IDE slaves. But i would go
for SCSI if you do big raid5 arrays.

With 5 ULTRA2 fast and wide scsi in raid5
array (software raid5 in Linux) i have seen
reports of 40MB/s read and write throughput.

And if you have dough, buy 2 controllers
and put raid5 array on both. And stripe among them

--miku
Re:just a small note about scsi vs. ide by TyFoN · 1999-11-18 17:13 · Score: 1

When i use my ide disk i can't even move the mouse. When i use the scsi disk my puter don't
seem to notice (mp3's dont stop, i can move my
mouse again). So i wouldn't even consider ide for my ws anymore.. even if scsi is 500-1000 NOK (US $80-140).
Re:just a small note about scsi vs. ide by arivanov · 1999-11-18 17:43 · Score: 1

Laugh at yourself. IDE drives have space set aside since the days of the first Seagate 89M-130M drive series.

So the man is right - no difference in reliability whatsoever.

The question is I think that they are actually hitting not a drive bottleneck but the UFS filesystem bottleneck so they should either abandon Solaris or buy (forgot what's their name) the file server and reliable filesystem solutions for Solaris extensions.

So even if they upgarde to RAID they are not going to get anywehere.

Also on the topic of RAID: There are very good external boxen using proprieatry solutions for IDE hotswap and presenting a single u2w or better SCSI interface to the box. And they are rackmountable. And they cost about 4000-5000 fully populated with 13-17GB EIDE.

--
Baker's Law: Misery no longer loves company. Nowadays it insists on it
http://www.sigsegv.cx/
Re:just a small note about scsi vs. ide by Holger · 1999-11-18 17:45 · Score: 1

TyFoN wrote: "When i use my ide disk i can't even move the mouse."

You fail to mention which chipset and transfer
mode you are using. And there _are_ SCSI hosts that do a lot worse than recent ATA interfaces (the cheap ISA-Adaptecs that come bundled with scanners and ZIPs for example).

I once was a SCSI advocate, too. Then came the Intel PIIX3 and Mword DMA mode 2, nowadays I am using a PIIX4 and UDMA33. I have _never_ had my system go slow on me with DMA ATA drives, much less the mouse pointer stop moving.

There is just one special case: Swapping to ATA drives can put more of a load on the system under certain circumstances (because I haven't seen ATA drivers use command queueing yet - it's already specified in the ATA spec, though), but you don't really want to be swapping in the first place. If your system does that constantly, you should have gone for more RAM instead of that pricey SCSI drive.
Re:just a small note about scsi vs. ide by Anonymous Coward · 1999-11-18 18:25 · Score: 0

Dude, turn DMA on!
Re:just a small note about scsi vs. ide by mprinkey · 1999-11-18 18:38 · Score: 1

...and name a hard drive that you will want to be using in five years. My three year hard drives are just slipping out of warrantee are 1.2s and 1.6s. Who cares!
Re:just a small note about scsi vs. ide by mprinkey · 1999-11-18 18:40 · Score: 1

my SCSI drives have a 5 year warentee.

...and name a hard drive that you will want to be using in five years. My three-year old hard drives that are just slipping out of warrantee are 1.2s and 1.6s. Who cares!
Re:just a small note about scsi vs. ide by sbryant · 1999-11-18 18:54 · Score: 1

No - he is in fact right. I don't know about all manufacturers, but certainly IBM drives use the same hardware - the only real difference being the content of a firmware chip (and the cable connector, I guess).

This doesn't mean that all the drive's features would be available for both SCSI and EIDE, and it doesn't stop them charging loads more either.

-- Steve
Re:just a small note about scsi vs. ide by Anonymous Coward · 1999-11-18 19:42 · Score: 0

The other difference is that SCSI is a protocol that handles multi-tasking much better than IDE, which is why most servers use SCSI controllers.
Re:just a small note about scsi vs. ide by Bin · 1999-11-18 19:54 · Score: 1

Seperate read write heads? No, not in the sense you are thinking of. The only reason for a seperate heads is for when you can't use one head for both jobs because one is physically unable to do the others job.

For example think of a readhead able to read the ever shrinking area of a single bit the surface of the platter (bear in mind that tricks are used to work out the real state of a bit, you don't need heads able to read a bit on a stationary platter). Attempting to use that head to write to the disk may well destroy it, you now need annother head to write with.

The two heads are on the same arm, and can't operate at the same time (no point, you know what your writing ;), so you get no two head advantage just more cost for the head. The gain of course is increased storage density.

Bryn
--

--
Or words to that effect ...
Re:just a small note about scsi vs. ide by Anonymous Coward · 1999-11-18 19:57 · Score: 0

I believe that many of the newer Maxtor have just such a warranty. Though I doubt you will want to use such a rotating media storage device in five years.
Re:just a small note about scsi vs. ide by Anonymous Coward · 1999-11-18 20:24 · Score: 0

VERITAS who you're thinking of?

Anonymous 'cause I work for 'em.
Re:just a small note about scsi vs. ide by heh2k · 1999-11-18 20:37 · Score: 1

i think you're confused about MR heads. when using an MR read head, you still need an inductive film write head. any drive that has MR heads has to have a seperate write head. also, you cannot read and write at the same time; the heads have different offsets at different angles. in other words, if you're reading a track, the write head will be off-center and if you're writing a track, the read head will be off center.

(ps, please correct me if any of this is wrong. i'm not an storage device expert, i just read a lot about it)
Re:just a small note about scsi vs. ide by Noer · 1999-11-18 21:04 · Score: 1

You're exactly right - there's little mechanical difference between SCSI and IDE drives, but the controller chip on the board is all the difference. Nevermind that IDE uses the host computer's CPU (making it a poor choice for anything but a desktop PC) because in the controller the poster mentioned, IDE is bypassed, implying that there is a separate host controller chip.

However, the SCSI *protocol* has something very important for servers. It's a feature called 'block lockout' as I recall, and it becomes important only when you have a server with multiple disk requests going on at once. Let's say you have several reads and writes going on at once (as you will on a server). SCSI is smart enough not to jump back and forth from block to block in sequence; it does the more efficient thing and tries to read as sequentially as possible. So while IDE can be as fast or faster than many SCSI solutions for a single data stream, when you have multiple data streams going on, IDE can't hold a candle.

This is why it's foolish to use IDE drives in a high-capacity server.

(You often won't notice the difference that much in Win9x or MacOS anyways, because both of those OS's are so inefficient at task-switching anyways).

I don't know what the Firewire protocol allows (though it has the very cool feature of isochronous support) but I'd hope Firewire has some of these features too (though they'd only become useful once there are native firewire hard drives, rather than the current crop of IDE drives with FW-IDE bridge chips that you can buy now)

--
-- "Those who cast the votes decide nothing. Those who count the votes decide everything." -Joseph Stalin
Re:just a small note about scsi vs. ide by HarveyOpolis · 1999-11-18 21:27 · Score: 1

Maxtor and Seagate have three year I believe. IBM is five, I'm pretty sure of that.

--
- Hugh Buchanan
- Userfriendly.com
Re:just a small note about scsi vs. ide by HarveyOpolis · 1999-11-18 21:32 · Score: 1

Upgrade your P100! Or replace it.

My workstation has three 9 gig SCSI drives. I have two 20.4 gig Ultra33 IDE drives as well.

I dont notice a difference between the two at all. I do a lot of large database work which swaps between the drives.... and like many other programmers I enjoy some mp3s or a quick game of doom while I wait for a database rebuild or something like that.

The only machine I have that gets a mp3 skip when I do a hard drive copy is my PII-266 laptop. It has an Intel 82371AB/EB PCI IDE controller. I think its the problem of the controller... possibly the drive is slow too

--
- Hugh Buchanan
- Userfriendly.com
Re:just a small note about scsi vs. ide by Anonymous Coward · 1999-11-18 21:56 · Score: 0

NO! Both IDE and SCSI use spare sectors. That's why you rarely see bad sectors from the OS on modern drives. With the old ESDI and ST-506 drives, it was common to see some bad sectors after about a year of use (esp. with those stepper motors causing all the drift). CPU usage is also a non-issue, bus-mastering controllers are available on both architectures. Where IDE falls down is in that it needs a separate bus for each drive. Multiple drives on a single channel must "take turns". Also, the very fastest drives are still SCSI, there are no 10K IDE drives.
Re:just a small note about scsi vs. ide by Anonymous Coward · 1999-11-18 22:25 · Score: 0

> AFAIK, The only difference between a scsi hard > drive and an ide hard drive is one little > controller chip on the drive. So the > reliability of the ide drives, mechanically, > should be identical to that of the scsi drives. Yeah, but that one little controller chip is pretty darn important. It means the disk itself can take most of the processing load for reading/writing to the disk, and that the disks can coordinate transfers between each other with minimal talking to the system. So with SCSI you end up using far less CPU for anything disk-intensive. For anything involving lots of disk access (such as database access, or any ISP-related activities) forget IDE. IDE's great for the desktop, but if you're already getting loads of 10 and 20 it tells me you need to offload as much of the disk processing as possible. Make sure your RAID is done through a hardware controller too... forget software RAID. Same problem. hitchhiker
Re:just a small note about scsi vs. ide by Anonymous Coward · 1999-11-19 00:15 · Score: 0

Want to know the real power of SCSI?

I have a SCSI cd-rom and a SCSI CD-R, and when I burn a cd I can play quake, or half-life, or defrag, or do whatever I want, and it still burns fine (with very few exceptions). I even had my computer lock up hard once, I couldn't even move my mouse, and when I pressed the num-lock key the light didn't change. This all happened while I was burning a CD. I left the computer sitting for a few minutes (the CD was halfway done burning) because I noticed the CD and Burner lights were still flashing.

Guess what? The CD finished burning perfectly. BFD, you say? Well, when your server gets cpu bound and your load goes above 10, do you want it to take your hard drive performance with it? No? Well with IDE it will, with SCSI the hard drives will continue to work just as good as at 0 load.

That is why I use SCSI, and why I am spending $32,000 (actually $1,000 a month for a lease) with Penguin Computing for a huge server with a 108 GB SCSI raid array.
Re:just a small note about scsi vs. ide by Anonymous Coward · 1999-11-19 02:14 · Score: 0

What physical limit are you refering to? I have a rackmount Arena with six 36GB IDE drives in it. It runs RAID 5 and is hooked up to the server with UW SCSI3 and give me aprox 175GB of formatted space. I dont run any hot spare (want to use the darn thing instead of just leaving it there) but you can if you want to. It supports hot swap and automatic rebuild. The whole thing including drives cost me less than $5000 and so far(only had it for about a month) it's be performing extramly well.(nock on wood). The units also sports dual hot swappable powersupplies. I find the buid quality very good and will order more of these units. Mikael mikaelo@images.com
Re:just a small note about scsi vs. ide by Anonymous Coward · 1999-11-19 02:46 · Score: 0

Sorry, but there is a big difference. Check the specs on the drives. The warantee is longer, the MTBF is longer and the hard errors are usually about an order of magnitude lower. IDE drives are a commodity product, they don't have to be as reliable.

Sun by sPaKr · 1999-11-18 14:23 · Score: 1

I had a similar problem. I went with the sun StorEdge A1000. Its just a greate piece of hardware. I got 12 18Gb drives, 10,000 rpm segate cheetas. Its in 2 raid 5 clusters. With on hot spare. I needed to geta differential scsi adapter.. as they dont come standard on ultra2's. Wow is it fast. I can move GB in what seems like seconds. Its a night and day improvment over a jbod box. A bit pricey.. about 17K after our 50% edu discount. Its all scsi-scsi, host swap disks, host swap power supplies. When you running solaris nothing beats sun hardware.. it just works.

Re:Sun by bgp4 · 1999-11-18 15:27 · Score: 1

I used an A1000 for a while on the back end of a UE3500. It was a terrible piece of equipment as far as I was concerned. It broke... a lot. And there aren't any useful diagnotics that the box gives out, just blinky lights. There is no Out of Band notification to be had. If it breaks, you have to physically inspect the box, and even then you still may not know what the real problem is until you replace just about everything. BTW: the internals of the box are basically an Intel PC (it's got a 486 chip on the main board)

--
I'm down with that, as it were
Re:Sun by _damnit_ · 1999-11-18 15:52 · Score: 1

DANGER! Conflict of interest. Sun is my employer

I'm sorry about your experience. However, I support Sun's internal hardware and I have not seen abnormal failure rates on the beasts. Sure, disks go bad - they have moving parts. I support loads of A1000s and they work great. As to diagnostics, that is a sore point for me as well. There's nothing really at the OBP level to test the array. They do come with software that is minimally useful however.

It may be overkill, but I much prefer the A5x00s. All around though the hardware from Sun is VERY good.

_damnit_

--

_damnit_

It's my job to freeze you. -- Logan's Run
Re:Sun by Anonymous Coward · 1999-11-18 15:59 · Score: 1

I agree. Although, I used to work for Sun. ;-)

I am currently contracting to a major shop setting up ISPs and we're using E250s with A1000s in the rear for data. I've been to 4 different sites in the world, with this setup and its just not failed so far, as long as you put a terminator on it. :-)

The RAID Manager software is good for setup, but nothing else. I agree there's nothing for diagnostics on it, but I've never had any failure on the device, except when I kicked one and 2 disks popped loose. But the disks were fine after that.

I wouldnt go for an A5x00 on an Ultra 2, just because a diff scsi card is much easier on the system then putting fibre in there and having more possibilities(?) of crap to wade thru. It is overkill. :-) james
Re:Sun by hollow_man · 1999-11-18 19:24 · Score: 1

Well, I agree with pro-SUN posters (no I don't work for SUN, although I had a job offer from them 3 weeks ago :P).
I've installed 3 A1000's over the last couple of weeks, ranging from the minimally specced ones (50Gb RAID5) to a fully loaded one (8x 36.4Gb)

Although RAIDmanager is only marginally useful and you have to make sure your /etc/nodename doesn't contain your FQDN, it's still one impressive piece of kit.

--
Full Time Idiot and Miserable Sod

--
Full Time Idiot and Miserable Sod
Nothing is real but the pain
Re:Sun by Anonymous Coward · 1999-11-18 19:43 · Score: 0

i just purchased the same product. only i got the 72gig rackmount version. it was a little over $10k + $1k for a differential controller. this was after a 20% discount for being a sun-developer-network member (free signup for membership).
Re:Sun by Anonymous Coward · 1999-11-18 20:33 · Score: 0

I concurr: I have a A1000 (ultra 1 head) with a database striped across it in a raid 5 manner. Verry nice perfomance and reliability.

More spindles, more simultanious reads by deranged+unix+nut · 1999-11-18 14:28 · Score: 1

You might also consider just adding multiple scsi controllers and have as many drives as possible.

With each additional drive, you can access another unique piece of data simultaniously. While raid is nice and helps solve reliability and performance problems, it isn't the only solution.

It is a technique that newsgroup server admins used to use, and probably still do.

Re:More spindles, more simultanious reads by jemhddar · 1999-11-18 15:22 · Score: 5

My raid experience comes from nt software raid and using AMI MegaRaid controllers. For performance the following things are important

PCI Bus-- The fastest controller/drives wont make a difference if the PCI bus cant get data to the drives fast enough. Look at what else you are running, consider upgrading memory/processor like another person said.

Stripe Size-- In a hardware raid setup the controller will write to one hard drive for xxx kb before switching to the next hard drive. You want to figure out what size 'chunks' of data the OS will send to the controller. Netware uses a 64k block size, which means large file reads/writes will be sent from the OS to controller in 64k pieces. If your stripe size is set to 8k, and you have 6 hard drives in a raid 5 array, look at the following situation.
drive1 - 8k total=8k
drive2 - 8k total=16k
drive3 - 8k total=24k
drive4 - 8k total=32k
drive5 - 8k total=40k
now time to calculate parity. this requires the controller to read data from drive1,2,3,4,5, calculate the parity using an XOR algorithm then write the parity
drive6 - 8k parity
drive1 - 8k total=48k
drive2 - 8k total=56k
drive3 - 8k total=64k
Now it has to calculate and write parity again.

compare this to a stripe size of 64k
drive1 - 64k total=64k
calculate parity, write parity
drive6 - 64k parity

Having a poorly configured stripe size can cause a huge performance problem. NT and NetWare(current versions) both optimize their disk writes to 64k. YES! I know the block size in NT is 4k, but the OS still optimizes disk requests to 64k chunks for performance reasons. I'm not sure about various *nix, can someone else answer that? Some people have the notion that writing smaller amounts of data to multiple hard drives is somehow faster. Hard drive maximum transfer rates are based on controller->hdd cache. A 64k or 8k write isnt going to fill up the cache on the controller, and a single 64k write will take less time on the controller, fewer commands will need to be issued, and performance will be better overall.

An anecdote about this.
Copying a 1.5 gig file from a workstation to a server with the stripe size at 8k took about 40min, with the stripe size at 64k it took 6min

Another consideration is how much cache the controller has and what its use is. The AMI Megaraid controller has 3 types of cache. Write, Read and IO. Write cache allows for Lazy Writes, which can improve performance. Read cache will allow the controller to read ahead, hopefully improving performance. IO cache(and I20 cards) allow the controller to take some of the work off of the processor, improving overall system performance.

Some controller come with multiple channels. The AMI MegaRaid series 438 controller has 3 different SCSI channels on it. IIRC each channel can transfer up to 80MB/S. This is similar to the idea of putting hard drives on different SCSI controllers except that I've never seen an implementation that allows a raid array to span multiple controllers.

The above info IS NOT ACCURATE for RAID 0, RAID 1, or RAID 3, those levels have different rules. You should consult the OS vendor, documentation, and Database vendor for specific settings to optimize the controller.

--
--
Re:More spindles, more simultanious reads by vilms · 1999-11-18 17:45 · Score: 1

or "simultaneous". whatever. Fortunately, on RAID systems that I have experienced (AMI MegaRAID and some of the Mylex offerings), the 64k block is an "idiot" setting; whew! Personally, I've found both Mylex and AMI to offer good products with *reasonable* support (Mylex above and beyond the call, when they really had no obligation to weigh in with assistance) and I'll be going back for some more of that when my budget allows. As for these shitcart IDE arrays; I know they're cheap and offer bundles of storage -and the diff between one of these and a SCSI job may be a nice holiday somewhere- but I have had a very bad experience with one (well, actually, two and then three) units that couldn't do the job it/they was/were meant to. My bad luck, or this is a despicable case of a "RAID-style toy: Not meant for serious use"? Finally, don't RAID 0 unless you can RAID 10!
Re:More spindles, more simultanious reads by flatrock · 1999-11-18 18:56 · Score: 1

1.5 gig in 6 min is only a little over 4 MB/s. Something is seriously wrong with that number. A single 1000 rpm drive can sustain 18 - 22 MB/s.
Re:More spindles, more simultanious reads by Anonymous Coward · 1999-11-18 20:23 · Score: 0

That's a reasonable number on a 100 Mb/sec ethernet assuming it's not on an isolated segment where you would expect it to be much higher.
Re:More spindles, more simultanious reads by jemhddar · 1999-11-19 08:07 · Score: 2

There is a great deal more information involved, part was the saturation of the PCI bus causing the slowdown, part was OS tuning, part was Hardware configuration. We were using IIRC 7200 or 5400rpm ultra scsi drives(not ultra 2). the point was to show it makes a big difference tho

--
--

Check to be 100% sure drives are the problem by vectro · 1999-11-18 14:36 · Score: 5

Before you go out and purchase an expensive RAID solution (of any kind), make sure this is really the problem. The vmstat command will make it quickly apparant what kind of i/o is happening, and further analysis might tell you more about what kind of hd accesses are happening.

In many cases, adding more memory or CPU can make a bigger difference than more/faster hard drives, if the problem is that the cache is too small, or paging activity too much. Also check your CPU load and make sure it is nowhere near 100% - if so, time to get a 2nd CPU.

Also, avoid software RAID implementations like the plague. They will slow down your system and provide questionable reliability. You should also try to find cards that have redundant SCSI controllers onboard, and support redundant cabling. This way if the cable, plug, or SCSI bus fails for some reason you will not be SOL.

Finally, be sure that the majority of your disk accesses are reads. RAID will slow down writes, sometimes drastically so. If the majority of your disk accesses are writes, then tuning your kernel to flush dirty buffers less often may make a good difference.

Re:Check to be 100% sure drives are the problem by Anonymous Coward · 1999-11-18 22:51 · Score: 0

Maybe you should make sure your drives work if there's no air around in case earth's atmosphere fails. Geesh. Somtimes too much redundancy doesn't make sence!!!

Dell Powervault by Beguile · 1999-11-18 14:37 · Score: 1

You may want to look at a Dell Powervault as a possible solution. Check out dell's website for details. They are VERY reliable and VERY fast, not to mention Dell has the best support in the industry.

Re:Dell Powervault by Longing · 1999-11-18 14:48 · Score: 1

You mean the Network Appliance Filer that Dell resells and calls a 'PowerVault'? Yes, they're very nice. But Dell doesn't make them, and 3rd party support is rarely as good as getting support directly from the manufacturer (at least for those manufacturers that also sell their products directly to the public.)

That, and they start at around $50k for 100GB, which isn't even local storage - it's network storage. (Choose CIFS, NFS, HTTP, or whatever else they support.)

Not that these aren't great boxes - we have one and are about to get a second one. But they're pricey and not as fast as local storage - which I believe is what this guy is looking for.
Re:Dell Powervault by Beguile · 1999-11-18 15:39 · Score: 1

Good point. You are right about everything except for the support. Dell's tech support is number one in the world and has been for some time now. Nothing you say will EVER convience me otherwise.

On a side note another good solution (except that it's not external) would be a Dell Poweredge server. I'm currently running a Dell Poweredge server with Linux and RAID 5 and it works quite well.

...and yes, I'm bias, I work at Dell.... in support... :-)
Re:Dell Powervault by Beguile · 1999-11-18 15:41 · Score: 1

...Oh, and did I mention that you can order a Poweredge with Linux factory installed?!?
Re:Dell Powervault by Anonymous Coward · 1999-11-18 15:52 · Score: 0

Where's your fun? Roll your own.
Re:Dell Powervault by Lord+Darwin · 1999-11-18 16:33 · Score: 1

LOL, i see you were sending that message from work! LOL! =]
Re:Dell Powervault by Anonymous Coward · 1999-11-18 18:44 · Score: 0

> But they're pricey and not as fast as local storage
excuse me? the whole point of the NetApps is to be faster than local storage. and they are, as long as your network is fast enough.
guess why they're used for newsservers all over the place? they plain and simple beat local storage hands down when it comes to I/O performance.
Re:Dell Powervault by Salamander · 1999-11-18 21:07 · Score: 3
>the whole point of the NetApps is to be faster than local storage. and they are, as long as your network is fast enough.

I think network-attached storage is a fine idea and the "right solution" for many things, but I just have to add a rebuttal here anyway.

Network-attached storage is faster than local storage if your network (including the protocol stack) is fast enough and your local-storage subsystem (including its own separate protocol stack) is slow enough. That's a totally useless claim. It's like saying that a train is faster than a car, leaving out the part about the train being an unloaded bullet-train engine on an empty track and the car being a Yugo stuck in New York traffic.

In actual fact, the raw bandwidth of modern storage interconnects (e.g. UW SCSI, FC) is higher than that of most network interconnects (e.g. 100baseT) for which the adapter cost is similar. In addition, the protocols used for storage (e.g. SCSI, the various layers of FC) are more suited toward that task - duh - than are the protocols used for networking (e.g. TCP/IP). There is no reason in hell that it should be faster to use network interconnects and protocols to access your storage than to use storage-specific interconnects and protocols.

Why might it appear that network-attached storage performs better? I can think of at least three reasons right off the top of my head:
- Many computers are "unbalanced". They are misdesigned or misconfigured so that they have a lack of direct-to-storage capability coupled with an excess of network capability. This may actually make NAS the correct solution for that environment but is irrelevant when considering the overall merits of the two approaches.
- Network-attached storage devices often benefit by having much more cache than direct-attached storage devices. If you took that same amount of cache and applied it to the direct-attach devices, the NAS boxes wouldn't look so good.
- The caching strategies used for NAS - i.e. thos in NFSv3 - sacrifice consistency for speed, while direct-attach systems are held to a higher consistency standard. Everyone who has tried to use NFS for something where data consistency or up-to-date modification times matter - even something like "make" - has probably cursed NFS already over this. Some NFS vendors make things even worse by failing even to meet the NFS requirements. Sun's own Solaris NFS client, for example, doesn't always flush data when it's supposed to. If you added all the appropriate sync() operations and fixed the NFS implementations so that your NAS solution was really doing the same thing as your direct-attach solution, you might see some different performance comparisons. Note, though, that for many applications the NFS tradeoff and hence the NAS solution is pretty reasonable.
At this point I should disclose my own biases. First, I work for EMC. That's not by choice - the company I was working for got bought out - and I'm often not thrilled about it, but the pay is good. In particular, I don't buy in to all of EMC's arrogant "storage is the center of the universe and the Symmetrix is the ultimate storage device" attitude, and I heartily dislike our own Celerra NAS product even though it blows the doors off NetApp in terms of performance and scalability. Secondly, my professional areas of interest include distributed, cluster, and SAN filesystems, so I of course have some fairly strong opinions on such matters. That said...

I think that once we start seeing true, mature, multi-platform shared-storage filesystems, NAS will start to seem much less appealing. Why pay for NAS when you can just add software to your existing hardware investment and get all the sharing with almost all the performance of local access? Now all we need is a decent implementation of such a filesystem.
--
Slashdot - News for Herds. Stuff that Splatters.
Re:Dell Powervault by Anonymous Coward · 1999-11-18 21:30 · Score: 0

There are or have been quite a few attempts to implement such filesystems. Veritas has been promising one for 7-8 years or so, but seems no closer to delivering one than when they started. CrossStore - or whatever cutesy misspelling of their name they use - has announce such a thing, and is working with SNIA on a standard. There are at least a couple more - Mountain Gate? - but the only shipping product in this space getting much commercial attention is Mercury's SANergy (formerly Suite Fusion).

That's unfortunate, because SANergy is a piece of crap. I tested an earlier version, and it failed even the most basic single-client data-validity tests in under a minute. The product's architect, Chris Stakuis, has admitted that it doesn't even attempt to address multi-client data consistency issues. The UNIX and NT versions are not fully interoperable. The UNIX version is an LD_PRELOAD library hack, which means it can't deal with statically linked executables (they ship their own versions of cp and other system programs) or mmap or fork/exec and there's no cache sharing between processes on the same machine. This is a product that should never have been shipped, and that might very well through its own sheer pungency sour people on the whole idea of shared-storage filesystems. That would be sad.

It should be obvious by now why I'm posting this anonymously. Instead of innovating or producing a quality product, they decided to make money via the legal route. They filed for and were granted US Patent 5,950,203 for the basic idea of shared storage with centralized metadata control, even though the idea is neither original nor non-obvious to skilled practitioners of the relevant art. They have already made it clear that they intend to require extortionate license fees for use of this technology they did not develop. Anyone who bitched about Apple et all charging $0.25/port for Firewire should just about be bursting a blood vessel with outrage over this.

I'd rather go work for MS than think that I didn't do all I could to ensure Mercury's failure. They're just that despicable.

Re:firsten posten bork bork bork? by Dr.+� · 1999-11-18 14:40 · Score: 0

Try the Encheferizer! It's a Fun Thing (tm)! Bork bork bork!

--
Eih bennek, eih blavek

Pricey but attractive by synaptic · 1999-11-18 14:50 · Score: 2

The Network Appliance Filers are really sexy.

The beautiful thing is they use the WAFL filesystem so you can expand your array when you need to without adding big sets of drives.

Granted, I don't have one but I've submitted the proposals and am waiting on financing. The F720 scales to 464GB, is network attached, has journaling (rad), and can benefit your WHOLE network.

Of course, you have to use NFS or SMB though. I've heard they start as low as $17k but usually $30-40k with a bunch of drives but it's difficult to find general prices without hearing the sales pitch.

This paper discusses testing the Stanford Linear Accelerator Center performed while evaluating the NetApp filers. It's geared toward Usenet news but if it can handle that, it can surely handle your mail situation.

Does anyone here have first hand experience good or bad with NetApp Filers? And some word on the pricing?

Re:Pricey but attractive by Longing · 1999-11-18 15:02 · Score: 1

Really Expensive (tm). But they work, and they work well.

Yes, they're network attached. Good for stuff that is going to be used over the network, naturally. Not good if you need -really- fast access to the data from -one- server. They have CIFS, HTTP, NFS, and something else. We use this for all of our UNIX and Windows home dirs - the same data is accessible via either NFS or CIFS, which can be quite convenient at times.

The feature I like the best from their WAFL file system is the snapshot. It's configurable, and can be set to take hourly and nightly snapshots of the entire file system. A user deleted a file? They can go back into their .snapshot directory and retrieve the copy themselves. Sure is a lot easier than having to pull files from a tape, and I don't know anyone who does hourly tape backups. :)
Re:Pricey but attractive by Anonymous Coward · 1999-11-18 16:42 · Score: 0

I'd have to dissagree. I have had experience with F740's that were brought in to replace some SUN Storedge A5000. We had the following faults in the first 3 weeks: 1) Dead Motherboard. 2) Corrupted fileing system such that the unit would dump 512MB core files on boot. 3) Dead Disks. 4) Hung Ethernet adaptor requireing a reboot to start it again. All in all this caused 15 - 20 hours of downtime. Network Appliance support is quite frankly a joke. The A5000 have there share of problems but usually this is down to dodgy GBICs which are easily replaced. One point to note is that A5000 only do software raid using Veritas. If you want hardware raid you need D1000s or A7000. -p
Re:Pricey but attractive by Anonymous Coward · 1999-11-18 21:38 · Score: 0

Hrm... we've got a F720 and have had almost no problems whatsoever. After about 6 months of running we did have one disk fail though. What amazed me was how fast the spare disk took over, and how helpfull the NetApp support was. They FedExed us a new drive at no cost and it was as simple as sliding it into the NetApp with no downtime at all to replace it. Maybe you just had very bad luck compared to the rest of us who all seem to love our NetApp :)
Re:Pricey but attractive by smackthud · 1999-11-19 00:13 · Score: 1

I've been reading some of the threads and thought it most appropriate to comment at this level.

1.) Yes, there are other Network attached storage solutions, however having spent significant time either using or studying them, my opinion is that NetApp is worth the premium.

2.) This is the part that people have trouble with : Network Storage (if implemented properly) is faster than server attached storage.

What many people don't realize is that in your kernel (admittedly I don't have time to waste on NT to be sure its true there also but it's true for most *NIX breeds) Network traffic gets precedence over SCSI traffic. Yes, this counts for FCAL as well. Your system will wait to even read the local system disk until it reads the network traffic.

If you consider that your biggest problem with a mail system is users **AHEM** er, the number of users making simultanius requests compounded with the number of routing messages; your solution should include optimizations to get network traffic handled as efficiently as possible. Taking advantage of the networks priority will yeild much better throughput than with server attached solutions, because a users request for data will be handled prior to the systems own needs. Bottom line... better throughput.

The stigma for network storage has always been the implemantation of NFS... consider that usually you are mounting a server attached disk over NFS. There is little you can do to significantly speed it up beyond the basics.

(SUN is always very quick to point out their own NFS numbers when anyone brings up Network Appliance. Their own numbers stink. They stink for a reason... money. SUN makes near 50% on disk sales and wants you to buy some of their crappy new A5000 disks, which need to be mirrored to be considerded Highly Available because they only come with one power cord- Duh.)

3.) WAFL is your best friend. I can back up a 1TB oracle database in 3min. how 'bout you? I do it 4 times a day, and could do incrementals every hour if I choose. Not too bad for a system that needs to be up 100% (it's the internet, baby).

4.) Gbit ethernet can have the same, and sometimes faster, throughput than FCAL. This depends greatly on the switches and GBICs you choose, so do your homework. Add in tweaks like optimizing your TCP packet's MTU size for your average read size and you're screamin. Lastly, put all your NFS traffic onto a private network, dual networks are preferable if you can afford it, and you've got a highly reliable, very scalable solution.

On price. Net app is expensive. I'v done comparisons and there are no two ways about it. I find the best justification is the total cost of ownership. Our Net App solutions always cost us less to run over the life of the system, and that is justification enough for me.

These are my 2Cents.
Re:Pricey but attractive by OldBen · 1999-11-19 03:20 · Score: 1

I've used NetApp as the message store for Sendmail and I love it. The only problem is now I'm moving to an enterprise class MTA (we're choosing between SIMS and InterMail), and neither work with NFS-mounted message stores. You should bear that in mind before commiting to NAS.

"Home Raid Solutions" by viper21 · 1999-11-18 14:54 · Score: 0

Kind of off topic, but for the past couple of years I've wanted to set up a small raid setup in my "server" here at home. What are the most reasonable setups that you've seen around? What raid hardware, and what drives would you consider best to use just for low cost educational purposes.

Completely off topic, if you click on my url above, then click on the computers section you can check out the new case I made... it's pretty cool, and everyone has been pretty interested in new cool case designs =)

-S
Scott Ruttencutter

--

We Apprentice Developers and Designers

Re:"Home Raid Solutions" by Anonymous Coward · 1999-11-18 15:20 · Score: 0

Unless there is important data on your server, why bother with RAID. I started up a web server through an ADSL connection with a single EIDE drive. Of course my only reason for doing so was to learn linux admin. If it was to go down, who cares. It was a learning experience. I remember college... Not sure of your requirements though.
Re:"Home Raid Solutions" by viper21 · 1999-11-18 15:47 · Score: 1

Basically, I Just want something to Play With (tm).

Just looking for a way to play with raid on a home system. As you put it, if it were to go down, who cares =) I'd rather make mistakes now while I can afford them.

I see you guys like the case on my page =)

-S
Scott Ruttencutter

--

We Apprentice Developers and Designers
Re:"Home Raid Solutions" by Joe_NoOne · 1999-11-18 21:18 · Score: 1

I've seen in the computergate catalog (www.computergate.com) they have a card that mirrors your hard drive to another (raid 0) for $85 or a 4 drive IDE raid card for $120. I think Promise makes some IDE raid cards under $200 also...
Re:"Home Raid Solutions" by mircea · 1999-11-19 04:30 · Score: 1

That's exactly the reason that made me play with the Linux software raid. Basically I've revived an old 486dx2-50, and stuffed it with several HDs, both IDE and SCSI (from dead Mac IIcx's), that I've combined in a raid0 array. Running fine for several months now.
If you're interested in details about the setup, mail me after removing the obvious :)

RAID 0 + 1 would be faster than RAID 5 by Longing · 1999-11-18 14:56 · Score: 1

It doesn't sound like you need a lot of space if you're currently doing well with 9GB and 7GB. Get a pair of 18GB drives for the spools and a pair of 18GB drives for storage, and you should be set.

RAID 0+1 is a lot faster than RAID 5. It's disadvantage is that it's more expensive because you have to buy 100% more disk than storage, as opposed to 20-33% more for RAID 5.

As far as which controller to use... Sun now rebrands DPT controllers, but they're pci and you're stuck on sbus, so I don't know.

Good luck

Re:RAID 0 + 1 would be faster than RAID 5 by Anonymous Coward · 1999-11-18 17:02 · Score: 0

Recently installed RAID 0+1 solution at work.

Config for SYSTEM:
- 6pcs 9GB Seagate Cheetah 10000rpm disks
- Dual channel RAID controller with 64MB cache
- Three drives per channel, internal installation.
- Configured as RAID5 with hot-spare

Config for DATA:
- 12pcs 9GB Seagate Cheetah 10000rpm disks
- 2pcs Dual channel RAID controller with 64M cache
- 4pcs External SCSI enclosures with hotswappable drive bays, each enclosure is equipment with three disks
- Configured as RAID0+1, three disks per channel, two channels have two disks with one spare, other two without spares and three disks.

Disks are installed on toy called Compaq Proliant. Quad Xeon 500MHz (or 550, don't remember) with 4GB RAM. Running WinNT, connected to multiple switches using two ethernet cards, each card has four 100Mbit/s FDX connectors. Also connects to our older LAN thru Token-Ring switch using two 16Mbit/s FDX TR cards.

BTW, this little toy (or should I say beast) is pretty fast for NT machine and no crashes so far.

Yes, it was kinda expensive too. :)
Re:RAID 0 + 1 would be faster than RAID 5 by Znork · 1999-11-18 18:43 · Score: 2

Not entirely true. RAID 0+1 is faster for writing, but RAID5 is usually, depending on configuration, faster to much faster for reading (you have more platters to simulatenously read from, and calculating checksums isnt necessary for reads).

Some array types (notably HP that I know of) will dynamically rearrange data storage between RAID 0+1 and RAID5 to optimize speed and space.
Re:RAID 0 + 1 would be faster than RAID 5 by Anonymous Coward · 1999-11-18 21:14 · Score: 0

I think you're confusing 0+1 with 1. RAID 1 is just mirrored, so you only have two drives, and your comment is valid for that. RAID 0 is striped, you spread your data across as many drives as you want, so you're fast but have no redundancy. RAID 0+1 is like RAID 0, but each of the drives in your RAID 0 array is actually a mirrored drive. If you want you can use twenty drives, set up in mirrored pairs and stripe your data across all ten mirrored pairs. This will be extremely fast for both reading and writing.
Re:RAID 0 + 1 would be faster than RAID 5 by jdz · 1999-11-19 02:25 · Score: 1

With typical workloads, it's the other way around. That is, reads are faster with 0+1, but writes are faster with RAID-5.
The reason for this is mostly the seek times. Each block in a read request can be satisfied by either component in a mirrored pair. For small requests, this means that the request can be routed to the drive with the shorter queue, shorter seek distance, or whatever other heuristic the controller wants to use. Good techniques for doing this have been around for years (cf papers by Gray, Bitton, et al).
Writing a block to a 0+1 config is more work, however. The write cannot be declared complete and committed to stable store until both drives have been updated, which means that the end-to-end I/O stall time is dictated by the larger of the two I/O times for each component of the mirror pair.
When updating a RAID-5, at worst all data columns in the I/O range must be updated, along with the parity column in each stripe. Logically, this means that each I/O is either RMW of each affected data and parity column (small-write case), or a W of each affected data column, a R of each unaffected data column in a stripe with an affected data column, and a W of each parity column in a stripe with an affected data column (large write). The RMW cycles can often be masked with good caching, especially when I/O sequences or subsequences are sequential. Even the suboptimal caching in current drives (track buffering) catches a lot of this. If the overall workload involves a lot of seeks, this will degenerate more rapidly than the 0+1 config, however.
The big problem is that most workloads don't expose enough concurrency to the controllers to keep them busy, so even though the "system" appears busy, the array controller is idle most of the time. It's even worse for the disks. Modern drives really want 20+ I/Os outstanding at the disk concurrently. Without that, they can't do much in the way of clever arm scheduling. The same thing goes for the array controller and its overall planning/scheduling- with very little work exposed to it at any given time, it doesn't have very much information on which to base good decisions.
Re:RAID 0 + 1 would be faster than RAID 5 by Znork · 1999-11-20 19:45 · Score: 1

Hmmm, you may be right, in some configurations you may actually be faster writing to RAID 5. It would depend on wether the actual bottleneck in writing lies in the seek times or bus to platter transferrate. That would be heavily affected by the type of load. Oh, well, guess it eventually falls under the old have-to-test-to-be-sure category.

Couple things by Falsch+Freiheit · 1999-11-18 14:56 · Score: 5

First off, it's not clear from your post how heavily loaded the drives really are.

In particular: load is a measure of how many processes are using or waiting for a resource (such as disk I/O, CPU or network I/O). On a busy mail server that's completely adequate for the job, I'd expect to often see a high load average due to the number of processes that are waiting on the network. That is, due to the number of processes waiting for slow network connections to places halfway around the world.

All you mention is the load averages and a fairly non-specific measure of drives that are "cranking away constantly". If the drives were being used at a current constant 10% of available I/O, they'd tend to "crank constantly" even if they could be hit much harder. (still, given that losing email is considered bad by customers, a RAID 5 solution seems like a good idea anyways and leaves you room to grow and handle sudden increases in email from the holidays or spammers or gradual expansion of business)

As to IDE vs. SCSI -- never go with straight IDE on a server. SCSI has the ability to lie to the OS and silently move data from sectors that have gone bad into sectors reserved for that purpose. Sure, it slows down access to that particular block of data, but it's a lot easier than the OS having to deal with failures directly. However, I'm completely unfamiliar with the strange SCSI - EIDE setup that you're describing -- if it treats them as just physical media and provided the SCSI interface itself, it may be able to do that particular SCSI trick, as well. Physically, SCSI drives and EIDE drives are identical -- as in, you can find the *exact* same drive from certain manufacturers, only one has SCSI and the other EIDE. Reliability of the physical media is the same, IOW. In a normal configuration, *apparent* physical reliability is higher for SCSI due to wonderfully useful trickery.

I don't recall the exact model numbers, but I've seen pretty good results with Mylex RAID controllers before. (more along the lines of database stuff than what you're talking about -- somewhat different needs, but not all *that* different, I suppose.)

I can't see putting two partitions on one RAID device as making a lot of sense -- since things are striped you'd end up running into contention issues.

IOW: I'd guess that option #3 would be the fastest -- it's also probably the most expensive.

If I were you, I'd check more carefully to determine how much of the currently available disk I/O is actually being used... If the budget allows it, the dual-channel RAID solution sounds pretty good. You might want to go with two single-channel RAID cards instead -- makes it easier to stock a backup card in case a card decides to die. Try and get something with hot-swappable drives, too. It makes the RAID stuff so much more useful.

Also, I don't know the details of your setup (of course), but seriously consider breaking the mail serving task into separate pieces and run it on separate machines.

You have:
1) incoming email
2) outgoing email
3) email from customers
4) email customers pick up (POP)

It sounds like you have one machine handling all of these. Breaking these tasks onto separate boxes (If you've made the mistake of telling customers the same thing for #3 and #4 (ie, mail.isp.net instead of mail.isp.net and pop.isp.net) it might be impossible to split those two tasks away from each other)

You can have a setup such as:
outgoing1 through outgoingN all behind the single name of "outgoing" that internal machines are told to send email to that they don't know how to deal with
mail1 through mailN all behind "mail" that customers are told to have as their outgoing mail server. In particular, it should blindly send off email it doesn't know how to deal with to outgoing.
pop (harder to break into separate machines, but possible)
incoming1 through incomingN with MX records pointing at them for your domain.

Now, breaking into that many machines is probably silly. Moving outgoing to one machine and everything else to a second machine (and possibly mailing lists off to a third machine) may make a *lot* of sense though. Don't get tied into the idea of a monolithic machine to accomplish everything related to a particular task -- eventually it's much more expensive than many cheaper boxes to handle the same task.

Re:Couple things by SendBot · 1999-11-18 15:35 · Score: 1

It sounds like you have one machine handling all of these. Breaking these tasks onto separate boxes (If you've made the mistake of telling customers the same thing for #3 and #4 (ie, mail.isp.net instead of mail.isp.net and pop.isp.net) it might be impossible to split those two tasks away from each other)
I suppose you could spam everyone and tell them to change that, and then have your router redirect that port to the appropriate machine for the people who forget.
Re:Couple things by Forward+The+Light+Br · 1999-11-18 16:07 · Score: 2

port forward to a dedicated POP server... its not so bad ;-)
We are all in the gutter, but some of us are looking at the stars --Oscar Wilde

--

Grrr. my nick is "Forward the Light Brigade"...
Re:Couple things by kijiki · 1999-11-18 16:22 · Score: 3

n particular: load is a measure of how many processes are using or waiting for a resource (such as disk I/O, CPU or network I/O). On a busy mail server that's completely adequate for the job, I'd expect to often see a high load average due to the number of processes that are waiting on the network. That is, due to the number of processes waiting for slow network connections to places halfway around the world.

Correct me if I'm wrong, but isn't the load the average number of processes in the run queue? This would mean that processes that are blocked on the network or disk would be in the sleep (wait) queue, and not counted in the load average.

In this case, a load of 20 means 20 processes are ready to run, which is not so good.
Re:Couple things by Anonymous Coward · 1999-11-18 18:30 · Score: 0

> I suppose you could spam everyone and tell them
> to change that, and then have your router
> redirect that port to the appropriate
> machine for the people who forget.

Forget it. I can tell you from experience that trying to rename a server is a tech support nightmare. The average customer of an ISP has no idea how to reconfigure their mail reader, and many (most?) ISP's distribute custom browser versions with the mail servers locked in anyway.

As for the router thing, there may not be a router between the customer and the mail server. In the case of the network I administer, our terminal servers are directly connected to the same ethernet the mail server's on.

The best solution is to design your network correctly from the ground up. Use a separate hostname for EVERYTHING or you'll probably regret it later.
Re:Couple things by Anonymous Coward · 1999-11-18 20:21 · Score: 1

The run Q length's rate of draining also depends on how many CPUs there is on the box. A machine with a run Q of 10, but with 64 CPUs indicates that there are 74 active process on it, which indicates that it is not breathing real hard.

Don't forget, in the above situation, the uptime load would indicate 74 and somebody would freak out and go reboot our E10K.
Re:Couple things by tzanger · 1999-11-18 21:07 · Score: 1

never go with straight IDE on a server. SCSI has the ability to lie to the OS and silently move data from sectors that have gone bad into sectors reserved for that purpose. Sure, it slows down access to that particular block of data, but it's a lot easier than the OS having to deal with failures directly

all new IDE drives (in the past 3 years I think) have this feature. They'll remap without the OS knowing until the extra sectors are full, THEN it'll start letting you know...

That's the reason I never accept a drive with even 1 bad sector. 'cause if I can see 'em, there's already too much damage.

As for controllers... I've used DPT RAID controllers... started with the old ISA 2044U I believe it was with a pair of 4G Seagate Barracudas... just recently moved to a pair of 9.1G WD U2 drives on a DPT Centry controller with 2 channels (one for the U2 drives, and the other channel for the slow SCSI-1 devices). The only trouble I have (haven't talked to DPT yet) is that the Seagate tape backup (travan 4G) seems flaky on the new controller... the CD-RW on the same channel works just great... All in all I'm very happy with DPT.
Re:Couple things by Anonymous Coward · 1999-11-19 02:06 · Score: 0

One advantage of SCSI drives -- you can low-level format them periodically to clear out the relocated sector table (a low-level format will lock the bad sectors out, clear the relocated sector table, and verify the whole disk so newly grown bad sectors are found). This extends the life of a disk somewhat. You can try low-level formatting a RAID array, but usually, it takes the SCSI command, and ignores the format, or just does a big-ass read.
Re:Couple things by tzanger · 1999-11-19 04:46 · Score: 1

One advantage of SCSI drives -- you can low-level format them periodically to clear out the relocated sector table (a low-level format will lock the bad sectors out, clear the relocated sector table, and verify the whole disk so newly grown bad sectors are found).

Do you have any other data or references on this? I am very interested in learning more...
Re:Couple things by Anonymous Coward · 1999-11-19 06:11 · Score: 0

I thought it was the ready-to-run queue - so you're load would be 10 if you've got 64 running, and 10 ready-to-run. Not 74.
Re:Couple things by Rambo · 1999-11-19 10:27 · Score: 1

As to IDE vs. SCSI -- never go with straight IDE on a server. SCSI has the ability to lie to the OS and silently move data from sectors that have gone bad into sectors reserved for that purpose.

I'd like to point out that this is not necessarily true. It's more a function of the quality of the drive in general, rather than IDE vs SCSI. IDE drives to indeed have the ability to do just what you mentioned; both drives contain processors to run the drive, and if necessary, transparently map out bad sectors. To quote from Western Digital's IDE drive feature page:

"Embedded error detection and repair features that automatically detect, isolate and repair problem areas that may develop over the extended use of the drive..."

Fibre Channel RAID by thesteveco · 1999-11-18 14:58 · Score: 5

We've just spent 2 weeks at my office researching the different solutions available to us for implementing the most reliable and scalable solution available today. Our needs differ a bit from yours as we're looking to put many machines on a network for load-distribution yet they all need to speak to the same data on a single repository. This holy grail is know as a SAN, or Storage Area Network.

Our solution is going to be a single cabinet RAID (level 5 for accessing smaller files) with a "hot spare" that will rebuild a crashed disk on the fly. This being a standard cabinet we'll have 8 disks, of which the capacity of 6 will be data (one parity (term used loosely as parity is striped on RAID-5), and one spare).

The disks are Seagate's 10,000 RPM Cheetahs, the most commonly recommended units among all the vendors we've talked to, and the controller is a multi-channel u2w with fibre interface to a Q-Logic PCI adapter.

The total system is going to run just over $15,000. This sounds like a lot, but pricing lower end systems isn't too much cheaper and you'll never get 24-hour turnaround on failed parts (if they're even available). This seems like overkill for a single system, but by adding a fibre hub later we can use the single system for many many machines once a file controller (dedicated machine) is put into place.

The beauty of SAN is that it operates much like FTP, with a control and a data connection. The control connection occurs over your existing LAN, and the data is transmitted directly over the fibre channel (max rate of 100 MB/s).

Other NAS (Network Accessible Storage) models are somewhat cheaper to implement, but performance can never match the fibre as the "control" and "data" connections (NFS or SMB) both transmit across your network.

I apologize for digressing from the straight RAID topic, but I felt obligated to give the /. community something to chew on in return for all that I've learned here.

-Steve

What about the AMI MegaRaid cards? by Wakko+Warner · 1999-11-18 14:58 · Score: 2

I'm thinking of getting one myself. It's supported in Linux, does hardware raid 0, 1, 0/1, 3, 5, 30, and 50. Does anyone have one? Is it decent? Can I trust my data to it? It's $150 on pricewatch, which sounds like a damn good deal for something with its own CPU on board.

- A.P.
--

"One World, one Web, one Program" - Microsoft promotional ad

--
"Remember when the U.S. had a drug problem, and then we declared a War On Drugs, and now you can't buy drugs anymore?"

Re:What about the AMI MegaRaid cards? by jemhddar · 1999-11-18 15:42 · Score: 2

The AMI MegaRaid cards are excellent IMO. Very clean setup of raid arrays, uses simms for its cache(on some models) so you can upgrade the cache easily. Like you mentioned it has an Intel I960 processor and the newer ones are I20 devices. I20 is a standard where the card has a processor to offload work from the main system processor. Not all OS's support it though. I20 will also allow for 1driver per card, instead of 1 driver per card, per os, per os version.

--
--
Re:What about the AMI MegaRaid cards? by faze · 1999-11-18 16:22 · Score: 1

Yes, they are great cards. I bought a few AMI 428s from Onsale a while back. Not to turn /. into e-bay, but I have one that I want to get rid of. Pricewatch is down at to moment, so I don't know what they are selling for. I will let you have one for considerably cheaper.

e-mail me at bm@datapace.com if you are still thinking about getting one.
Re:What about the AMI MegaRaid cards? by Corvar · 1999-11-18 19:15 · Score: 1

Recently built a server with an AMI 428 card. It is an old hunk of a HP NetServer 5/166 LS2 (Dual Pentium 166's). The preformance speed up over straight scsi was quite nice. I am running 3 raid 1's. But at about 3 week intervals I am crashing. There is a new driver for the controller which I haven't tried yet, but it doesn't list the mysterious SMP + MegaRaid crashes as resolved. The box is running Debian with a 2.2.10 kernel.
Re:What about the AMI MegaRaid cards? by Anonymous Coward · 1999-11-19 04:07 · Score: 0

I've set up 2 Linux boxes using AMI Megaraid controllers (438 & 428) with Redhat v6.1 being the distribution of choice. However, I ran into some problems right off the bat with Redhat v6.1 as some of you who have installed it might be aware of. For one, the kernel doesn't work correctly with AMI Megaraid controller in identifying more than one logical drive on boot up. Secondly, if you have another SCSI controller on system used for the system drive, the RAID controller will be seen first thus messing up the device file mappings, i.e. sda shows up as the first logical RAID drive. Anyways, the solution to both problems is to download and compile the 2.2.13 kernel. These servers are setup as caching proxy servers as well as providing DNS and file sharing duties. This setup has worked very well for 3 weeks now. I am using an external scsi enclosure w/ hot swap capability attached to a 438 series 3-channel controller with 16MB of cache. Eight 7200rpm IBM U2LVD drives are in the enclosure. The other box has a 428 controller with 4MB of cache and 5 7200rpm Barracuda drives inside a server.

Suggestions by mosch · 1999-11-18 14:59 · Score: 2

on the IDE v SCSI be careful. with some drives the difference really is just a chip, but often drive manufacturers will use different actuators and such for SCSI drives (due to the fact that they're more likely to be dropped into a high-stress environment). The MTBF for a drive that's expecting to run grandma's recipe book is not relevant when used as a high-stress server.

I'd suggest a SCSI or Fibre Channel raid array, with some 10,000RPM drives, and lots of cache on the drives and the controller. If you are currently IO-bound, you want to make sure that you remove that bottleneck for at least a couple years. Some sort of external enclosure might be nice if only due to the fact that 10,000RPM hard drives make a LOT of heat, so it keeps things a little less critical. Oh, and of course I'd recommend using RAID-5 for obvious reasons. RAID-0 is faster, but clinically insane.

Re:Suggestions by mckyj57 · 1999-11-18 20:12 · Score: 1

> on the IDE v SCSI be careful. with some drives
> the difference really is just a chip, but often
> drive manufacturers will use different actuators and
> such for SCSI drives (due to the fact that
> they're more likely to be dropped into a
> high-stress environment).

Actually I cannot believe no one has brought up the real difference between IDE and SCSCI:

-- Tagged command queueing

Where IDE drives fall down is their inability to process more than one command at once. Though I am out of the hardware game now, I do run both types of drives, and IDE still is very bad at multi-tasking.

SCSI drives can have multiple commands sent to them, and can disconnect and acknowledge the command. When the command is complete, the driver services it. You won't notice the difference until a server begins to get hard, then you will really notice the difference.

Actually, if it is mostly a mail machine, I wonder if you have made the best change I ever made to my mail server, a line in syslog.conf:

mail.* -/var/log/maillog

The dash preceding the file name indicates that
syslog should not sync after every log entry. I
don't mind the miniscule chance of losing a mail
log message on a hard disk crash, and the performance difference when processing mail queues is huge.

Re:RAID by aran · 1999-11-18 15:02 · Score: 1

Another solution is to look at Communigate mailserver from http://www.stalker.com
It allows you to cluster your mail server to multiple servers with very little fuss.

your raid dellemadilemma by johnstonwinn · 1999-11-18 15:02 · Score: 1

don't listen to that crap, the scsi drives are built for industrial use. You get what you pay for. Go for the scsi setup you will be glad in the end. as for the raid five config, if you dont have the fault tolerance on (parity stripe) and are just doing it to crank every bit of speed you can out of that box well go for it; but what the other guy said about checking the amount of writes your raid is doing, sounds like a well thought out solution. i know people who have professed their love for ide but when they get a taste of scsi, they rarely go back.

try to do some benchmarking before you buy by troutman · 1999-11-18 15:11 · Score: 4

This is only mildly applicable to your question since it isn't for Solaris, but it is all I have to offer.

I spent a fair amount of time looking at RAID 5 solutions this past summer for a client. Both external and internal, for Linux. Tried several different controller card brands and drive configurations, did a lot of reading, and bugged a lot of vendors.

You really should try to test your options and all of the configuration combinations using something like Bonnie, on a machine with a simular configuration to your target server. Make sure that your Bonnie test file size is at least twice physical RAM, to eliminate the effects of RAM and controller caching on the results.

I found that using 6 drives in a RAID 5 config was a LOT faster than 5 drives, most of the time. In fact, 3 drives in an array was faster than 5 in some cases. I think it has to do with the way the controller cards were calculating the distributed parity, and perhaps also due to things the driver was doing. 4 drives usually wasn't much better than 3, either.

Stripe sizes for the array can also make a big difference. 32k vs 128k, etc. Larger strips sizes are usually better for I/O speed, but you may find for email that having a higher number of random seek transactions per second is better than raw speed.

I did not get a chance to do any hard testing of multiple channel configurations with these cards. I suspect that splitting the I/O onto multiple channels would be a win.

IMHO, you definately want a i960 based board or system, with the fastest CPU you can find on them. I noticed a signifigant difference between boards with the 33Mhz part vs. the 66Mhz part.

FYI for others: for controllers, the AMI MegaRAID (alias Dell's PERC2/SC) just blows chunks. Older non-LVD, non-raid SCSI systems can run rings around it, at least on write speed.

It has been my experience that the write speed on a RAID 5 system is generally only a fraction of the reading speed, like 1/4th to 1/2. For a quick and stupid test, do something like 'time cat /proc/kcore > /tmp/kcore' and do the math for MB/second.

oh, and my current favorite card is the DPT Millenium V controller, using it in several systems in various places for the last 3 or 4 months. Here are some Bonnie results for a system with a DPT with 6x 7200 RPM drives, all on the same channel (internal) Linux kernel 2.2.10, dual P3 500Mhz:

-------Sequential Output-------- ---Sequential Input-- --Random-- -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks--- MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU 1024 7637 97.5 16743 15.2 9561 19.4 8384 98.3 52923 36.2 583.2 9.0

Re:try to do some benchmarking before you buy by Anonymous Coward · 1999-11-18 16:15 · Score: 0

You must be joking somewhere, subtly. DPT has the advantage it is stable and known to work and be reliable. It is also very very slow and the linux-raid mailing list has shown this time and time again in comparison to other cards, such as the ICP-Vortex. And for goodness safe, the i960 is way behind the edge these days. you need multiprosessing of several such, dedicated ASIC or StrongARM based systems to be credible. Secondly your benchmark looks dubious; do you really believe you get 583 seeks per second??? Presumably you have more than 1 GB RAM and forgot that you *MUST* use a buffer size LARGER than the RAM, otherwise you end up benchmarking your RAM. Read up on the Benchmarking HOWTO and the Multi Disk HOWTO and see all the pitfalls you have fallen into.
Re:try to do some benchmarking before you buy by ericfitz · 1999-11-19 05:48 · Score: 1

3 disk RAID is faster than 5 disk RAID because of latency: the higher the number of disks in the array, the higher the probability that one of them will be at the maximum possible latency (i.e. it rotated past the data immediately before the request).

I also love the DPT SmartRAID V Series (I use a Decade in my workstation). As the last poster said, though, be sure to set your stripe size appropriately for your application.

something isnt right by ZxCv · 1999-11-18 15:11 · Score: 3

our setup has right about 31,000 users constantly checking and sending email and is running RH 6.1 on a dual PII/333 with 128mb ram and 9g UW SCSI. I haven't seen a load higher than 0.75 since that machine has been the mail server... maybe something about how your mail server is setup is creating a tremendous bog on it.

--

Perl - $Just @when->$you ${thought} s/yn/tax/ &couldn\'t %get $worse;

Re:something isnt right by Anonymous Coward · 1999-11-19 01:34 · Score: 0

Something isn't right allright... From what I understand you're running a mailserver with a 9.0GB disk for 31.000 users, thus giving each user a mailbox of a mere 300 KB on average. Knowing average users your mailserver would run out of diskspace in a matter of hours... Else you made a typo. IMHO that is.
Re:something isnt right by mubes · 1999-11-20 09:44 · Score: 1

This really is wobbling off-topic now.... I had exactly this kind of problem on one of our servers. Turned out to be some users weren't deleting POP3 mail once they'd read it and the particular pop3 we were using (a Linux one) appeared to need to read the whole mailfile into core. ...might be worth a looksee. DAVE dmarples@iee.org

raid 5 by sql*kitten · 1999-11-18 15:15 · Score: 1

You've got to look at the disk access characteristics of mail servers. In many cases, you'll find that you have lots of writes, and a comparable number of reads. RAID5 works best in situations where you need space, redundancy and reads, but if you want good write performance, you need to sacrifice space and go with a RAID 0+1 solution, also known as a mirrored stripe set.

I would recommend a Sun MultiPack with Solstice DiskSuite for management.

Load Ave 10 need not mean an IO Bottleneck. by noidd · 1999-11-18 15:15 · Score: 2

Load average is defined as the number of processes sitting on the run queue. This need not indicate a disk IO bottleneck.

I would be surprised if any exim system was having more of a bottleneck to disk than it was to network. Your disks are faster than your network and exim is pretty light on un-required disk access.

The more bottleneck to network (by network I mean end-to-end with your customer not just your links) is large, the more processes are going to hang around longer.

More processes, more paging, less cacheing. Less cacheing, more IO. More paging, more IO.

Probably teching granny to suck eggs - but you do have your swap space on a seperate device don't you ;)

The more exim processes that hang around longer, the more processes for the CPU to switch around. The more switching, the more likely you are to see paging.

If the processes hang around longer, they take up more memory which reduces the cache-size available.

Exim has several files which it accesses frequently, mainly the retry databases and its configuration. These should perminantly be in memory.

Bottom Line:

I do however suggest that you don't consider moving a single server to RAID. If you have a server that you want to move to RAID for efficency purposes... your design is wrong and you should be building a scalable system .

Red

Re:Load Ave 10 need not mean an IO Bottleneck. by ricojansen · 1999-11-18 16:30 · Score: 1

It is important to really find out if the disks are the problem.
I suggest you examine your system carefully to see what is actually happening. Besides using vmstat, iostat and friends you can get
a software package by Adrian Cockroft which has a 'virtual adrian' which points out all the bad spots in the system.
It can be found here : SE toolkit
Re:Load Ave 10 need not mean an IO Bottleneck. by remande · 1999-11-18 20:56 · Score: 2

Load average is defined as the number of processes sitting on the run queue. This need not indicate a disk IO bottleneck.
Indeed, a high load average indicates that there is no I/O bottleneck, and a low load average may indicate an I/O bottleneck.
The run queue holds only those processes that the kernel thinks can constructively use CPU cycles. Once a process asks the kernel to access an I/O device, the kernel decides whether the device is currently available. If not, the process gets kicked off the run queue until the device becomes available again.
Thus, if you have a lot of processes hitting the same device, an I/O bottleneck would actually drop the load, as there are fewer processes able to use the processor.

--
--The basis of all love is respect
Re:Load Ave 10 need not mean an IO Bottleneck. by treat · 1999-11-19 02:28 · Score: 1

Thus, if you have a lot of processes hitting the same device, an I/O bottleneck would actually drop the load, as there are fewer processes able to use the processor.
Sorry, but processes waiting on disk IO do count against the load average. They're the ones that vmstat and top show as blocking.
Processes blocked on the network do not count against the load average.

SCSI RAID by JSG · 1999-11-18 15:20 · Score: 2

Personally speaking for a load of this magnitude SCSI is the only solution.

Don't even think of software RAID.

For some background on SCSI itself try http://www.scsifaq.org

There are many types of RAID 0-5 are the "standard" but there are several new ones eg level 10 which attempts to address throughput issues. Your actual space requirements don't seem outrageous so level 5 would be reasonably cost effective.

Another thing you will probably want is hot swapping. Once you've had a box tell you a drive is dead, you've removed it and popped a new one in without taking the box down, you will not want anything else.

On the IDE vs SCSI debate, whilst IDE is fast it seems to me that under continuous load SCSI gives better throughput.

As others have pointed out - a 'designed' server, rather than a "roll your own" box would make sense. Compaq Proliants make excelent Linux machines. The SMART arrays are very good and support RAID to level 5. You can fit a lot of disks in the drive cages as well. They are a little pricey but of a good quality and reliability. We have rather a lot of them running NetWare. I get to use the older kit to run my funny Open Source stuff ...

A suggestion might be:
Proliant 1600, 2 x 600Mhz processors, SMART 3200 with 64Mb cache, 5 drive slots - 81 Gb available after RAID 5 on 18Gb 1" drives (that's Ultra-2 SCSI) supports upto 1Gb RAM (has 128 by default). There is also an on-board SCSI interface for CDROM etc. This comes in at about GBP 9,000

Re:SCSI RAID by Anonymous Coward · 1999-11-18 16:19 · Score: 1

Don't even think of software RAID.
I'd like you to back up this claim (if you can).
You see, serious people like Deja does in fact use Linux software RAID and get it to work. Rather well too. Does zero-point-two-five-percent disk related downtime sound OK to you? It does to them.
Re:SCSI RAID by PapaZit · 1999-11-18 20:40 · Score: 1

The biggest problem with software RAID is that it often relies on fallable humans to brush and feed it.

When you're a sysadmin, and your big machine has a disk crash, life gets hectic. The last thing you want to be thinking about is which device name and partition you want to restore to and from -- and you sure as hell don't want to mix them up. Hardware RAID handles all of this for you.

In large organizations, too, there are sometimes separate hardware and software people. Life's a whole lot simpler if you don't have to coordinate each step between groups. With hardware RAID, operations hears the beeping RAID box (or the sysadmin calls operations), operations grabs a spare, tells the sysadmin, then swaps the disks. With software RAID, the sysadmin gets a warning (if he has things set up correctly), tells operations to swap disks (if there's hot-swappable hardware) or power the machine down (if there isn't), has operations swap the disks, then the sysadmin manually restores the disks.

--
Forward, retransmit, or republish anything I say here. Just don't misquote me.

What about the software side? by dne · 1999-11-18 15:22 · Score: 1

I'm not familiar with Exim, but aren't there more efficient solutions?

Although my experiences have been with much smaller configurations, qmail reportedly handles loads of this magnitude on lesser hardware.

Re:What about the software side? by spodpit · 1999-11-18 17:04 · Score: 1

> I'm not familiar with Exim, but aren't there more efficient solutions?

Probably not ... a certain (v) large UK ISP I know quite a lot about uses exim on it's email system because it's more secure than sendmail (but then, what isn't?) and more efficient than qmail (see below).

qmail starts up a seperate process for every email it delivers, whereas exim starts a seperate process for each batch of email it delivers. On a lightly loaded system, the point it probably moot - however on systems like what we are discussing, it's quite probably not!

Acouple of notes.... by Anonymous Coward · 1999-11-18 15:24 · Score: 0

Wow... acouple good suggestions in here, actually. I'm a Sun/HP/AIX UNIX guy by trade, so don't expect a Linux/Intel type answer. (Although, Linux @#$%'ing rocks....) Acouple ways you can go... you don't really have major performance needs, so I'd suggest going with a Sun Multipak (6 drive housing) on a dedicated u2w card, and then choose 9G/10KRPM drives if you want performance as a RAID0, or 18G/7200 or 10KRPM for density, ina RAID0+1, or RAID5.... using SDS, or preferably VxFS, if you want to ainty up the cash. Now, this might now work in this case, but just to through it out there, tonight I noticed a MAJOR shift in Sun storage Arrays. Pricing on A5200's (check it out www.sun.com), which is a Fiber link, totally redundant 22 Drive solution, has dropped on it 9G/10KRPM models, from $95k, down to $71k. This is temp pricing till they kill inventory, because they are now offering 38G/10KRPM drives in the same model, for 68K!!! That's 1/2 a TERA!!! And it scales, by chaining A5x00 boxes via Fiber, or via a fiber hub. If you can afford it, this is the way to go. Stay away from NetApp and EMC, they might be "sexy", but TCO is OOC (out of control). And, I'll have to agree on the "stay away from IDE solutions" bandwagon...the biggest mistake Sun ever made (other than using PCI) was putting IDE drives in UltraSparcs, the drives KILL the performance on the Ultra workstations. Good luck.

Re:Acouple of notes.... by Anonymous Coward · 1999-11-18 22:08 · Score: 0

Funny..my experience with Sun has been exactly the opposite. I work in a huge multi-national company that has large mid and high-end machine requirements. We have many Sun E6 and E10 machines. All of the were delivered with A5XXX Photon arrays which quickly died in numerous ways. Sun arrays suck!! They are overpriced crap. We have since moved all, read 10s of millions dollars worth, of storage off of Sun arrays and on to Clariion arrays. Night and day difference. Clariion makes a much better product for the same or less cost. We will never use Sun arrays again and if we had an alternative to their hardware we'd use that too!

SSA-Managers list by Anonymous Coward · 1999-11-18 15:26 · Score: 0

post your question to this list. it focuses on the use of sun disk arrays. whilst this might not be your 'final solution', you are likely to get an intelligent answer from people operating arry systems on sun hardware

send 'subscribe ssa-managers' to
majordomo@Eng.Auburn.EDU in message body

it is not likely that you need to kick out the Ultra 2, or add more buses. you probably just need more spindles. i don't know EXIM, so i would not want to comment further.

Have a look on www.sunworld.com in the article archive. there have been a recent series of decent articles on RAID systems. and make sure you are not bottlenecking elsewhere in the system.

Another authorative source is Adrian Cockroft's book on Sun Performance Tuning, and Wong's book on Sun Capacity Planning (both Sun texts).

RAID Setup by chown · 1999-11-18 15:30 · Score: 1

I used to run a large mail server at a fairly big ISP who will remain nameless, and I'd like to suggest you consider a RAID-10 solution, we were experiencing disk bottleneck problems, and this really helped. Basically, RAID-10 splits the disk i/o half and half over multiple drives with the standard mirroring/striping. This is a simplified explination, but that's the basic idea.

Procedure by jocks · 1999-11-18 15:34 · Score: 2

First try iostat -D -l (numberof disks+2) 5 to get percentage utilisation in 5 second intervals.

This is my favourite tool for disk analysis. Secondly go to http://www.sun.com/sun-on-net/performance read what you feel is important but download the se toolkit.

Run zoom.se to get a professional analysis of your system. Run virtual_adrian.se to get a virtual professional to tune your box.

I recommend you do this BEFORE spending any money. I have an E3000 with 2Gb RAM and 2% processor utilisation because nobody checked the system properly.

If it is your disks I recommend sun kit even though it is expensive and RAID 5. Don't worry about people telling you about it being slower, compared to a thrashing single spindle it is extremely fast and as importantly reliable. Tinker and learn!

NetApps... Pricey but really quite nice by Anonymous Coward · 1999-11-18 15:38 · Score: 0

I work at a large European ISP. We maintain mail systems for a somewhat larger user base using Network Appliance Filers. We use the filer for stored mail and the local mailers' RAID arrays for spool. This would be an expensive solution for you to take, but it's very very scaleable, and lots of other data needs can be taken care of by the same filer. Have a volume for your mail, oh and one for logs, one for news, one for webspace... If you run out of 100BaseT, put in a quad 10/100 card. One ultra-cute feature is you can have an NT box and a unix box talking (rw) to the same data. We don't (NT - eww!!), but you could. So you can spread the (yes, rather high) cost across lots of different services. They give good performance, and are mostly very stable - we have several with uptimes over 100 days, having wrapped their NFS-ops counter some time ago... :> I would recommend them if you can spare the cash - think $50k for one that will take care of most of your data storage needs. However, if at a later time you want more space, buy a couple more shelves. Management is pretty easy too, and it is pretty damned hard to get them to actually lose anything.

drives by noy · 1999-11-18 15:44 · Score: 1

if this is a server, don't go with IDE - you are a business looking for *safety* of the data as well as performance, and should be willing to fork over the extra 20 to 100 percent it takes for scsi...

as for controllers, i say mylex, high-end adapter of your choice, i would beef it up to 128 megs of ram in any case...

as for the drives, go 10,000 RPM, the difference in access times will help you out, and i think that is much more important in your case than trasfer rate... for an ISP, i would only ever buy IBM or Seagate drives, reputable workhorses that they are...

for great cases and setups, i honestly recommend macgurus.com - they specialize in mac stuff, but a scsi tower is a scsi tower, and they will build it with good components at a reasonable price to whatever specs you need... (no, i dont work for them)...

more spindles by Anonymous Coward · 1999-11-18 15:50 · Score: 0

I agree with others that something in the setup sucks to be getting such bad performance.

Measure first, change later

I think the problem is the contention on the single spindle for mail. Ouch!

Next RAID5 sucks for small writes. Don't even bother getting RAID unless you need the redundancy. But do get more spindles. A bunch (4-5) of 5400rpm 2Gb drives is going to perform much better than one high-speed high-capacity drive

PS. Actually, if you're going to use a lot of drives then you probably will want to RAID, because the chance of a failure is more likely with more drives. But do it mirroring (RAID 1 or RAID 0/1)

OB__s__VIO__i__US__t__LY__g by Anonymous Coward · 1999-11-18 15:50 · Score: 0

If the question is about IDE RAID5 sol'ns over SCSI (ardie-ar-ar).. then this is a felonious debate from the outset... try DEC RA3000's on for size/speed... with upcoming linux support they're should be something to consider...

mail configuration by lucky+luck · 1999-11-18 15:57 · Score: 5

Hi,
a couple of years ago we had the same problem till I discovered that all our mailboxes where in one mail spool directory. This was a huge bottleneck and after adapting qpopper and configuring sendmail to a split mailspool dir load came down to 1. (split mailspool is /mail/a /mail/b /mail/c and all users which will begin with an a will be placed in /mail/a ... etc ... )

check above first before you buy hardware

Re:mail configuration by little+alfalfa · 1999-11-18 23:34 · Score: 1

This can definitely help a lot. A directory with lots and lots of files in it (I mean in the tens of thousands) can be quite a bottleneck. I experienced this at a previous job where we had a development group that had over 150000 files in a single directory. With lots of processes trying to access that directory at the same time, preformance would just flat out suck. When there are multiple processes all trying to do a stat() on a directory with 150000 files in it, be prepared to wait!
A good test would be to try two things. With the machine live, go to the directory where all the mail folders live and do an ls. If you're waiting more than a few seconds for the ls to start listing the directory, you've got a problem. If the ls lists files for more than a few seconds, you've got way too many files in that one directory. Try splitting it up.
You might also want to try using lsof on that directory so that you can get a feel of how many processes (people) are trying to access their mail spools at the same time. You might also want to try it at different times of the day.
Don't let this stop you from going out and buying a RAID solution. I bought a JBOD wint 6 36Gb drives and a Mylex ExtremeRAID 1100 card for our company fileserver and I've never had a problem since. That card is definitely a winner in my book.

Good Luck!

Usefull RAID info by Anonymous Coward · 1999-11-18 16:05 · Score: 0

Check out http://www.penguincomputing.com/RAID.html for some usefull info on RAID

Best RAID? by Anonymous+Freak · 1999-11-18 16:09 · Score: 1

Well, after dealing with many different brands of RAID controllers, I have found that DPT's Millenium series tend to be the best. The card takes care of everything, and they're available in 64-bit flavors with 3 onboard U2 channels, or 2 Fibre channels.

Mylex are good if you're looking for a cheaper solution, or Adaptec for dirt cheap. But, if you're looking for the absolute fastest possible solution, it would be Fibre Channel Quantum Atlas 10k's on a 64-bit DPT Millenium Fibre controller in a RAID 0+1 configuration. With a 10 drive setup (equal to the total capacity of 5 of the drives) you could easily reach 100MB/s. Of course, that's gonna cost you a pretty penny.

--
Another non-functioning site was "uncertainty.microsoft.com."
The purpose of that site was not known.

I just did this... my experiences by abulafia · 1999-11-18 16:12 · Score: 5

Our mail server is currently handling about 1M messages a day. IO became a serious issue. We're still using sendmail, and I'm not going to give it up (we know it, we have a custom builds for strange applications, it works). As others have noted, load average doesn't mean much here - I have some machines with a load average at 4 that are actually idle and fine, and others at .2 that need tuning. Ignore it and concentrate on what matters.

Assuming IO matters, I am putting my full faith (and job) on Mylex controllers. I love them. I only have one in production, but am about to deploy 5 more, and we'll come in at about 600G managed by them. They just work. The DAC960SXi I have in production (for 7months now) has been flawless, delivering wire speed doing RAID 5 without any effort after initial config (which is a bit annoying, to be sure).

My production system using it is doing far too many things - mail, staging server, enterprise backup. This is changing - lack of time and historical accident made it that way. The point is that the Mylex handles it with no grief.

If you're building these, be aware that Mylex external controllers need to be mounted in a box with "internal" style connectors. For good RAID cases, check out http://www.storagepath.com/ - they are what I'm using. They look low rent, but the boxes are nice (if a bit expensive).

Down to specifics. For a mail only machine doing the sort of volume you're talking about, I'd deploy a dual processor box with three SCSI busses (one for spool, two for mbox/system access - system access is pretty cheap in comparison) attached to two harware RAID setups. Granted volume allows, I'd go RAID 5 for spool (with 18G disks, that's ~65G spool) and hot spares. For mboxes, I'd do 0+1, for as much space as needed. Stripe disks on independent controllers, mirrored to each other. Striped mirrors can grow, as you need them to (RAID 5 can't, easily). You don't want to lose anyone's mail. Hot spares for each.

Assuming 100G of mboxes, that's a total of 17 18G disks. Add three Mylex DAC9660SXis and (initially) 3 rack mount cases, and that's something around ~24K.

Availability beyond disk is a different question, that gets platform specific. I do mainly Solaris now, so I can't talk much about Linux for this. Mylex controllers can do dual active/dual host configurations, but things get more complex, and
a summary here doesn't make sense.

Other options like A1000s (Sun specific) and Netapps require different approaches - they're very different beasts. We have all of the above, and treat them very differently. We'll buy them all again - they're all decent - but are good at different things.

If you can, buy raw Mylex contollers through a reseller like TechData or similar - you'll save a lot.

Hope this helps some.

-j

--
I forget what 8 was for.

Evaluating RAIDs by Grimwiz · 1999-11-18 16:13 · Score: 4

The first thing about a hardware raid controller is that it hides failures from the operating system. With software RAID you have to manually carry out all sorts of tasks, and I'm sure we've all heard of the engineer who mirrored the new blank disk on top of the one remaining data disk of a mirror.
Units such as SUN A1000 and Baydel connect via SCSI and you just watch for an orange light, even the part-time cleaner could pull out the correct disk and replace it and have the system back and running without the OS noticing. Storageworks and Clariion(EMC) do the same but over Fiber Channel. SCSI units tend to top out at 40Mb/s, Fiber Channel theoretically top out at 200Mb/s (they have two 100Mb/s loops) but since I only had a max of 30x18Gb disks to play with the disks were the bottleneck. Monster multi-scsi machines like EMC/IBM's can achieve whatever bandwidth you want by multiplexing SCSI connections.
We've evaluated software RAID, Hardware RAID over SCSI, Hardware RAID over Fiber channel from EMC, IBM, SUN, Compaq(storageworks) and in our opinion a good smart raid controller with two data channels and load balancing software is impossible to beat.
For Speed, stripe(0) mirrors together(1), in RAID 0+1, this allows reads at double speed because each mirrored disk can handle a request seperately, and slightly sped-up writes because you can write to the RAID controller's NV cache and carry on doing your work whilst that takes care of putting the data to media.
This of course has only a 50% data efficiency.
Using Raid 3 or 5 you lose one disk in a rank for parity, raid 6 (used by Network Appliances) use two disks for parity but have wider ranks of disks. This often means that sequential reads are fast, because a request for data wakes up all the disks in the rank, but therefore the whole rank can only handle one request at a time. Writes are slower because you have to read a stripe of data, calculate parity and write the whole stripe back again.
RAID5 is really good for data which doesn't have to be the absolute fastest.
Whilst we were doing performance tests, we measured a linear increase in speed up to 20 disks (in transactions/second), and there is a definite art in making sure that you spread the load over all the disks available so that a single disk doesn't get thrashed to death.
In conclusion? well, that depends on your OS.
For me, for a PC-based system I would choose a hardware RAID system with SCSI connection which let me choose the LUN sizes. 5 disks in a RAID5 configuration will only waste 1 disk in capacity. If you're finding your mail spool is being thrashed then I would build a 10 disk 0+1 raid and stripe the mail area across them, using the rest of the area for home areas or web areas or something else which has large storage requirements but doesn't get hit hard.
Oops, this assumes that this REALLY is your problem, a lot of disk problems go away by adding more memory to the machine... I assume you have measured this by tracking the outstanding I/O queue.

--
-- Don't believe everything you read, hear or think

Re:Evaluating RAIDs by Emnar · 1999-11-19 00:03 · Score: 1

Using Raid 3 or 5 you lose one disk in a rank for parity, raid 6 (used by Network Appliances) Network Appliance machines use RAID-4, not RAID-6.
Re:Evaluating RAIDs by The+Finn · 1999-11-19 01:00 · Score: 1

This guy's right... Netapps are RAID-4. (Or at least, the older ones are. They might be using some kind of double-parity RAID-4ish thing now...) According to the CMU Raidframe papers, RAID-6 is similar to RAID-5, but with two (or more) paritys split across multiple drives.

The dedicated parity disk(s) aren't a bottleneck with the Netapp, since you've got all that nvram -- things won't actually get written out to disk until there's a full stripe to write.

Everybody here seems to be jumping on the hardware RAID bandwagon. Why? The "software RAID has too much overhead" arguement is bogus -- parity calculations are cheap compared to the amount of time it takes to actually write stuff to disk. Modern busses like SCSI aren't terribly CPU intensive, either. And more often than not, it's I/O that's the bottleneck -- not CPU.

I also have to reiterate what other people have been saying: just because the drives are flashing all the time doesn't mean those disks are saturated.

Does Solaris do softupdates? Does sun offer a prestoserve (NVRAM) card? Especially with lots of files which are changed rapidly (which happens on a mail server) both softupdates and prestoserve could save a lot of that I/O from even hitting the disk in the first place...

If you do want to go RAID, I'd suggest RAID0+1. (mirror the individual stripes, instead of striping the mirrors.) It eats disks, but gives you the potential of losing half your drives without hurting anything. Split the mirrors across two cabinets with two separate SCSI busses, and if one whole cabinet dies, you still come out smelling like roses.

--
NetBSD: the cathedral vs the bizzare.
Re:Evaluating RAIDs by otis+wildflower · 1999-11-19 01:09 · Score: 2

Writes are slower because you have to read a stripe of data, calculate parity and write the whole stripe back again.

Kinda why you want gobs of battery-backed RAID controller cache memory... (and a UPS, and clean power... ;)

Your Working Boy,
Re:Evaluating RAIDs by abulafia · 1999-11-25 08:16 · Score: 1

Hardware raid just handles things. Nuff said.

0+1 isn't always the answer. Your analysis about losing half the disks is the best case - if everything dies in one cabinet, you're OK.What about a double disk failure in the same cabinet? It happens. RAID 5 mirrored, or R5+transctions, is sound sleeping.

In any case, _Back it up._ RAID doesn't mean you can forgo it. I know this sounds paranoiod, but RAID fails sometimes, even when you don't want it to. Having a backup _will_ save your job.

--
I forget what 8 was for.

Mailbox format can definitly affect performance by Tabbycat · 1999-11-18 16:20 · Score: 1

The original posting doesn't say if the server is running pop/imap, and thus if it is used as the final delivery point for those 10,000 users.

If it is, then the hashing of the mailbox path that lucky luck mentioned is worth investigating. Also worth investigating is alternative mailbox formats. If you're using mbox format, then I'm not surprised there's a problem if you have a large number of users (and/or reasonably large mailboxes).

There has been some discussion about these issues on the exim-users mailing list. I read it via egroups.

Ready made docs already available on the net by Anonymous Coward · 1999-11-18 16:29 · Score: 0

First of all you need a closer analysis, do you for instance use .overview files? The idea of RAID5 for News is a good one, especially for sites with many disks. There is however a case for mixed RAID use, multiple SCSI host adapters, multiple hosts etc.

Rather than pontificate here I'll rather direct you to some rather compreghensive documentation in the form of the Multi Disk HOWTO. It is part of the Linux Documentation Project but don't let that fool you, the HOWTO has examples of SunOS servers, practical implementations, clustering and more. It does look like what you are looking for.

There are guides, principles, a guided method and examples of several implementations. And if you need more you could try mailing the author

Ack by Anonymous Coward · 1999-11-18 16:30 · Score: 0

Did someone say RAID 0? RAID 0 isn't even a real RAID solution. You just make one big-ass partition across multiple hard disks and if any one fails, you lose everything. Personally, if money was no object, I'd go for disk duplexing... twice as many controllers, twice as many drives, easy *and* fun! The best solution, however, is probably RAID 5... unless 6, 7, 10, or 53 has something that I don't know about. Where the hell did "53" come from? Freaks.

Clariion fibre-channel RAID box by weave · 1999-11-18 16:33 · Score: 1

Be sure to check out the www.clariion.com web page for information on their fibre-channel external RAID units. These units can be managed separately through their own console connection and support redundant everything, including I/O controllers. They support Sun and Solaris. The box also supports hot spares, so if one disk fails, another is automatically bound into the RAID group and rebuilt. With just a single hot spare, you'd have to lose two disks before risking data loss.

As for me, I'm considering their lower end SCSI boxes connected to high-end Intel server running Linux, beings I have $52,000 to spend this year! (yippee). The idea is to put all the money where the valuables are (the data) and use commodity hardware and open source software to drive it. The OS would boot from internal HD and all data and local customizations (ie, /usr/local) would be on external RAID box. If a CPU box fails, unplug it from the array, plug in a spare CPU box, reboot. Minimal downtime due to hardware problems. I can then repair or replace the busted CPU box at ease.

For linux jockies, there is efforts to bring fibre-channel drivers to Linux. Be sure to look at the work at Worcester Polytech for info.

Re:Clariion fibre-channel RAID box by osmanb · 1999-11-19 12:31 · Score: 1

Yes indeed! First: I used to work for Clariion, doing software development for the FibreChannel box controller software. I fell in love with the hardware there, and they really do have products that span the whole spectrum. (7-Up is a small, 7 disk array with a single controller board, but which does write caching by backing up unfinished writes to a PCMCIA flash memory card. Good stuff.) As someone else said, they've since been bought by EMC, so I'm not sure how available any of their stuff is.

Other people pointed out that you might want to switch to RAID 3 for this application. I can assure you that the Clariion software is EXTENSIVELY tested and optimized for RAID 3. SGI uses Clariion arrays for video storage/streaming/editing work, so bandwidth (optimized by RAID 3) is their number one concern.

Of course, FibreChannel is JustPlainCool(tm). Good luck with whatever you choose...

-Brian Osman

Seconded! Re: netapps by Dom2 · 1999-11-18 16:35 · Score: 1

When I worked at Demon, the netapps were one of the most reliable pieces of machinery that I administered. Whilst you might think that network attached storage can be a performance problem, in practice it worked very well indeed.

You do, however, need to be aware of how to make your application play well over NFS. Exim is actually reasonable at this. Qmail is good at storing mailboxes on NFS thanks to it's Maildir technology, but the mail queue *needs* to be on a local disk... I'm not sure about postfix or sendmail (bletch).

Unfortunately, I can't remember the command to make the individual LEDs on the disks blink, which is one of the best remote diagnostic features ever. :-)

-Dom

Re:Seconded! Re: netapps by Anonymous Coward · 1999-11-18 20:16 · Score: 0

shelfchk will turn on all the red lights on all of the fiber channel as well as scsi drives. There are rc toggle commands to do individual.

Wrong, wrong, wrong by jocks · 1999-11-18 16:48 · Score: 4

I accept that you will need to test to make sure that the disks are not the problem but you will need to do it the right way.

Firstly vmstat tells you very little about disk i/o. What it is good for is the processes. Look at the output from vmstat 5 for example. The first three colums are r b w, running, blocked and waiting. If there are blocked processes look at WHY processes are blocked. Use top to get the i/o wait information. If there is a lot of io wait then look at the disks. Use iostat -D to get percentage utilisation of the disks. If there is a lot of disk wait then you may need to either add more disks or spread the load.

It is interesting to note the relative speeds of devices:
If cpu takes 3 seconds to do a job then,
Level 1 cache takes 10 seconds
Level 2 cache takes 1 minute
Memory takes 10 minutes
Disk takes 7.7 months
Network takes 6.5 years

Get stuff off your disks better! Monitor your cache hit rate to get information on efficiency. Use vmstat or sar or stuff from the se toolkit. Get the se toolkit from http://www.sun.com/sun-on/net/performance. Run zoom.se to monitor your system. Run virtual_adrian.se to tune your system. Use the right tools and don't just add more memory, identify the bottleneck, fix the bottleneck, re-test and repeat until the performance is satisfactory.

Re:Wrong, wrong, wrong by Anonymous Coward · 1999-11-18 20:08 · Score: 0

You are correct that you should be doing analysis of your system to identify bottlenecks and then work to eliminate them. However, this should also include the application level, as well.

I can't speak for how one would go about doing application-specific tuning for things specific to Exim, but I find that many MTAs share the same problems. I'd suggest going to the Sendmail Performance Tuning for Large Sites paper that I wrote and presented at SANE'98, and see what of those problems (and solutions) might be applicable to your situation.

It is entirely possible that you could end up tuning the system performance enough that you don't even need to buy any additional hardware at all, just change the configuration of the software and OS you already have.

That said, if you've done all this and you still have problems, you probably do need to buy some new hardware. If that new hardware you need to buy turns out to be disk storage, Sun will be glad to sell you a StorEdge system that will implement RAID levels 0, 1, and 5 in hardware.

This would probably be the simplest and easiest solution to implement on a Sun machine, since the people who sold and installed the machine originally can help you with the expansion. However, it's likely to be rather expensive from a price/performance perspective. Note that Sun OEMs their hardware from SymBios/LSI Logic, and you can buy higher-end equipment direct. See http://www.metastor.com/ for more info.

If even the MetaStor hardware isn't enough for you, then you might want to consider vendors such as EMC and comparable units from Hitachi Data Systems. For my part, the HDS equipment can have a larger cache (up to 16GB in some units), can segment the cache so that different hosts get their own dedicated slice (which EMC can't do), and overall seems to simply be more intelligently implemented.

If you were in Europe, I would suggest looking at Comparex, since they are the licensed HDS VAR for this region.

Probably Not I/O Thats the problem by Anonymous Coward · 1999-11-18 16:59 · Score: 1

As others have already mentioned, you should really look into tracking down where the problems are before you go and spend $$$ on a new RAID system.

A few things that may help;
1) Our POP mail server (~1000 users) running on an old SUN Solaris machine (LX) was having problems because of the number of NIS lookups that were going on. System CPU was up near 75% constantly, I/O waits near 0, and load was also very high. Solution; make mail server a NIS slave as opposed to a NIS client. Reduced load by 20% immediately. Same goes for DNS lookups.

2) Make sure you're not writing/reading to/from NFS mounted fs.

3) Install rec. Solaris patches - these can make a big difference. Try installing Virtual Adrian, and see what it reccommends.

5) Don't buy EIDE for all the reasons mentioned previously. For lots of simultaneous hits, SCSI outperforms EIDE every time.

6) Consider fibre channel disk arrays from SUN - expensive but they are nice especially the new A5200. Give 22 spindels as opposed to the 14 in the A5100.

7) Ignore the guys talking about s/w RAID solutions being a BIG slowdown. Sure h/w RAID 5 is much faster than the s/w equivalent but when it comes to RAID 0+1 then there ain't a lot of difference. Not only that BUT s/w RAID systems tend to be much easier to configure and maintain w/o a doubt - check out Veritas Volume Manager (love it!) or even the free DiskSuite (with Sun Solaris server version) is better than any h/w RAID configuration I've seen.

8) I would bet my next salary that adding a RAID system to your mail server will increase performance by less than 15%.

Oh, and I've been managing enterprise level Sun systems now for 8 years, so I'm not just a Linux geek who has read too much ;)

Hope this helps.

This may have helped me as wel... by jdube · 1999-11-18 17:00 · Score: 1

This is my HD:
Filesystem Size Used Avail Use% Mounted on
/dev/hda1 486M 358M 102M 78% /
/dev/hda2 3.8G 2.7G 909M 75% /usr
/dev/hda3 964M 501M 413M 55% /home
/dev/hda5 99M 20k 94M 0% /tmp
and that's AFTER cleaning out... before I had / at 100%, /usr at 100%, and /home at 100%. I have a 4.3 gig HD laying around which I had FreeBSD on for awhile (been thinking aoubt putting BeOS on it) but I may use this idea and go for it.

If you think you know what the hell is really going on you're probably full of shit.

--
If you think you know what the hell is really going on you're probably full of shit.
jdube is who I am.

RAID is not the answer to your problem by joost · 1999-11-18 17:58 · Score: 1

I would not use RAID for the problem you're describing. You're most probably better off splitting the box into several others.

For example, try using a fallback mailhost for outgoing mail (fallback_mx in Sendmail). That way messages that cannot be delivered within a couple of seconds are relayed to the fallback server, keeping your outqueue clean and tidy.

For incoming mail, use a different server, or if you can, use several. You could just put them all in the MX list of your domain, with the same priority. This does wonders.

It might be smart to look at the mailbox format. Some mailbox formats (MBX) have much better performance than others. And you could put POP3 and IMAP on a third server.

All this is much preferable to simply installing a RAID array, IMO, based on the information you presented.

Re:RAID is not the answer to your problem by Anonymous Coward · 1999-11-18 18:12 · Score: 0

There used to be problems with NFS mounted disks on IMAP servers that could cause corruption. Sounds like the server didn't like the timeouts. I haven't seen if the problems have been fixed so assume the worst and avoid NFS-mounting.

Yup, I love Clariion stuff too. by Colin+Smith · 1999-11-18 18:00 · Score: 1

Used to work for Data General, the parent company. Fantastic hardware. They've just been bought by EMC though.

I would definitely try to tune the system before throwing hardware at it though. Find out exactly where the bottleneck is.

--
Deleted

Dell Tech Support by Anonymous Coward · 1999-11-18 18:08 · Score: 0

I have heard that much of Dell's support is outsourced to one of the world's worst phone support companies, Stream. Also, while Dell's own phone support teams might be slightly better, I have heard that they have a huge turnover, and most of the phone reps that stick around are morons.

Re:Dell Tech Support by Anonymous Coward · 1999-11-18 19:56 · Score: 0

You got everything right but the company name they outsource to. I'd tell you, but I work there in a different dept. and our MIS often has nothing better to do than read what we

RAID Solution by _Oblivion · 1999-11-18 18:09 · Score: 1

I ran into the same problem not long ago. Our local ISP needed a backup solution. The old tape drives were not doing their job anymore. But, we built our own RAID cabinet. We bought a 8 disk RAID enclosure with dual redundant power supplie from Siliconrax. The controler is a Mylex External RAID controller. The card is nice, it allows expandablity down the line. The card comes in a full height enclosure (keep it in mind, its big). We used 18.6 gig Seagate drives in the system. Each drive was mounted in a CRU Data Port removable enclosure for hot swap. RAID controller has a LCD front panel making setup a snap. The array was configured with RAID 5. RAID 5 is redundant, and provides fast read access, but write access is slower. All in all, the the array is about 100gig online. It cabinet is connected to a SGI O2. The only thing to watch is the cable length!! We've been doing nightly backups over NFS since the array was turned up. The system is nice. Go SCSI, and go the research on the proper controller. If the money is there, go fiber.

More on SE - Orca by moscow · 1999-11-18 18:26 · Score: 1

For long term monitoring on Solaris, I would recommend Orca. This is a perl based tool which uses the SE toolkit to collect data. It then stores it very tidily and produces HTML with PNG graphs that let you see many performance statistics on daily weekly ... up to yearly cycles. The home page is here .

--
Who would believe in penguins,unless he had seen them? Conor O Brien - Across Three Oceans

raid and sun solaris by jason+andrade · 1999-11-18 18:34 · Score: 1

having read through most of the the thread, my $0.02 is:

definitely install virtual adrian to get a better
idea of system tuning you can do and where your
real problems might lie. have you tuned all the system paramaters possible ? ncsize ? turned off
all non essential daemons/apps on the machine ?

mylex controllers seem reliable but were definitely a pain to configure - we're using them on a dec fileserver solution. one downside that appeared was they took 6-8 hours to initialize the array - compared to 1.5 hours for a non mylex controller :-/

we're now switching from DEC+Mylex to Sun+Infortrend who make a very nice scsi-scsi controller. www.infortrend.com - we're using the 3201U2G - 4 Ultra2Wide scsi buses.

don't go to raid unless you know what you're getting yourself into - it's far more complex and expensive in the long term apart from your initial investment in the hardware. you'll have larger spares provisioning, your documentation (you do have some right ;-) will be more complex and your backup system might need some work too.

my rule of thumb at present is JBOD to 50G, RAID
as a NAS for 50G-500G and SAN (RAID/fibre) for above 500G. you really don't need raid below 50G except for specific performance reasons

it's been an interesting thread to read, since i'm
right in the middle of working on a raid5 server implementation.

-jason

Don't focus 100% on the hardware by Oestergaard · 1999-11-18 18:35 · Score: 1

First of all you might want to check out other MTAs, as well as other methods for storing the user's mails. If all mailboxes reside in the same directory, you're spending all your time in the kernel doing _linear_ searches thru the mailbox directory. You could spend millions on EMC hardware without seeing _any_ performance increase.

I'd recommend using the Postfix MTA, as it has almost all features of Sendmail, and it's secure, and (hold on) it's even faster than QMail. Eventually you could use it with the Cyrus IMAP/POP services. You definitely want to make sure that you don't have all mailboxes in the same directory. Build a hierarchial structure where you never have more than say 30-50 subdirectories/files in one directory.

Ok, if disks are still your problem, consider:
1) Software RAID is usually a lot faster than hardware RAID. And for the money you save on the HW controller you could buy faster/more disks.
2) An IDE disk is identical to an SCSI one, except of course for the interface and the warranty. The price difference is mainly due to the warranty.
3) UDMA/ATA-{33,66} IDE interfaces are as fast as any SCSI solution if you keep _one_ disk per channel. The main problems with IDE solutions is the short cable length allowed (a problem for 10+ disks) and the number of controllers you must have (one controller for each two disks)

You can spend $50K on a SCSI/HW-RAID solution easily. And you won't know if you'll even get the speed of one single UDMA drive from it (yes people actually get 15MB/s both from their single UDMA drives, and from their expensive DPT RAID solutions). At least consider a software-RAID and eventually IDE solution before rushing out to spend the next 10 years budget on the shiny HW-RAID solution.

Your setup is fairly small, eg. you would probably do just fine with a four-disk RAID-5/10 for spool and mailboxes. This is where SW RAID is worth considering. Granted, for 20+ disk systems, HW RAID may well be a better way to go, eventually combined with SW RAID.

My 0.02 Euro.

RAID 5 vs. Striping by videoranger · 1999-11-18 18:48 · Score: 1

Many have posted followups here mentioning that RAID 5 may not be your best avenue. To recap, this is because of the performance overhead associated with the calculation of parity data. Unless you have a reliability issue, RAID 5 is probably something to stay away from. An exception might be hardware RAID, but such solutions are expensive and will still involve a slight performance hit.

The multi-controller solution is probably best; someone mentioned the Sun StorEDGE product with the Cheetah drives. This is a great piece of gear, and coupled with some really good storage management software (might I suggest Veritas Software's File System/Volume Manager) you'll get a very flexible solution providing the most bang for the least buck. With the Veritas product you can manage the data on the fly over several drives, and monitor & tweak the configuration on the fly while in a production capacity; additionally, the Veritas product provides a journalled filesystem which will allow rapid restarts in the event of a crash and if you have the drives, can be configured to fail over to available spares.

Yes I am a Veritas Consultant =^) but that does not change the fact that this is an excellent product that would probably go a long way towards addressing your issues (which seem more performance oriented than reliability related) on your existing drives. Check out this link for more info: http://www.veritas.com/library/su/fsconceptwp.pdf

Good Luck!

-Videoranger

--
Heaven offers little comfort like winamp and a big disk full of Dave Matthews MP3s

SCSI vs. IDE by Anonymous Coward · 1999-11-18 18:52 · Score: 0

It's unbelievable how many people are confused over this. It's very simple... there is no place for IDE in a server. This is because SCSI devices are, for the lack of a better word, multi-threaded whereas IDE device operate in serial. For example, let's say your system is trying to read data and do a write at the same time. With IDE your OS has to issue one command to the controller which passes it to the device and then waits... and waits... and waits for the data (or the acknowledgement) to be returned from the device. With SCSI, the OS tells the controller all the operations it wants to do and the controller looks at it and decides if there is an optimal way of doing the commands. Then it sends all of the requests out and allows each device to complete it's task in any order. In other words SCSI operates in parallel while IDE is sequential (or serial). Major performance difference here (unless you are operating under very very light loads such as a desktop system).

Re:SCSI vs. IDE by Holger · 1999-11-18 19:22 · Score: 1

> It's unbelievable how many people are confused over this.

Yes, it is. There are still people who recommend SCSI without further investigation.

> For example, let's say your system is trying to read data and do a write at the same time.

No decent OS would do that. It would concentrate on reads and save the writes for later, unless the write cache is full.

> With IDE your OS has to issue one command to the controller which passes it to the device and then waits...

With IDE maybe. With ATA not. ATA does have everything that SCSI has, and more. Read the specs at www.t13.org.

> With SCSI, the OS tells the controller all the operations it wants to do and the controller looks at it and decides if there is an optimal way of doing the commands.

Of course, only if you have a host adapter / driver which support command queueing, and an application that _does_ do multiple accesses at the same time. Most don't. And a decent OS reorders the commands anyway before they are sent to disk, partly eliminating the need for reordering by the drive.
Re:SCSI vs. IDE by Anonymous Coward · 1999-11-18 19:54 · Score: 0

Except you seem to have no appreciation of what an IDE to SCSI RAID controller does, or the capabilities of modern ATA drives either. However as UDMA66 drives only go upto 7200rpm then if he is really I/O bound then SCSI to fibre is probably the best bet. The great thing about a hardware IDE to SCSI RAID controller is you can use dirt cheap IDE drives to build large arrays at bargin prices. They work and work well. Try the following URL to see about such devices. http://www.zero-d.com/zd400.html
Re:SCSI vs. IDE by Anonymous Coward · 1999-11-18 21:42 · Score: 0

If you are going to buy a SCSI controller anyway then where is all that additional cost? SCSI drives aren't that expensive. Isn't this a proprietary solution? Can I load NT, Solaris, BeOS and Linux the system that has this solution implemented? My point is this... A great solution already exists. It's open. It's reliable. It's fast. It's flexible. And yes, it does cost about 20% more than an IDE solution.
It's called SCSI.
Re:SCSI vs. IDE by mckyj57 · 1999-11-18 23:38 · Score: 1

> > It's unbelievable how many people are confused over this.
>
> Yes, it is. There are still people who recommend SCSI without further
> investigation.
>
> > For example, let's say your system is trying to read data and do a write at
> > the same time.
>
> No decent OS would do that. It would concentrate on reads and save the writes
> for later, unless the write cache is full.

Unless you are doing a syslog operation, as most MTAs do, which syncs the disk.
You can disable this in syslog.conf, of course, the biggest performance win
I have seen for most mail systems.

>
> > With IDE your OS has to issue one command to the controller which passes it
> > to the device and then waits...
>
> With IDE maybe. With ATA not. ATA does have everything that SCSI has, and
> more. Read the specs at www.t13.org.

Specs are specs. Real-world implementations? Widely available controllers?
Systems with this in standard? Drivers that match? For multiple OSes?
Re:SCSI vs. IDE by Anonymous Coward · 1999-11-19 02:08 · Score: 0

SCSI drives are lot more than 20% above IDE, quit fibbn' to promote an agenda.

attractive but attractive by svetz · 1999-11-18 18:54 · Score: 1

I do sys admin for a software company with a mixed Unix-NT enviroment. We had some terrible experience with Samba on Unix, and NFS on NT. About a year ago, we purchaced an F720 with 100GB, for around $50,000. Now we have another F720 with a 300GB fibre-channel RAID.We talked with other NetApp customers, and they were extatic about the reliability of these machines. Although I can't say that the filer was %100 reliable, like we heard from smaller sites, we're VERY satisfied with it's performance. In the last year, we've only had 2 occasions with signifficant (> 10 minutes) downtime. As far as speed is concerned: it's usually faster than our local disks...
One of the best things about it is it's simplicity. GUI people use the nice Java applet to control it (it get's better with every release of the OS), and us Unix people have a great command line interface.
If you plan to use the NetApp with lots of clients (about 500 in our case) in a mixed enviroment, the Netword Appliance is probably the most reliable and simple to maintain solution. If you want the fastest RAID array to connect to your mail server, it will simply amaze you :-)
If your budget allows, got for it!

Re:Yes it can. by Anonymous Coward · 1999-11-18 18:56 · Score: 0

I think you are little off base. On a solaris box, the equation that makes up load can also include blocked processes. ie. processes waiting on I/O. Yes it can be network and/or memory, but a single 9GB disk for mail accounts is most likely the problem. The slowest part of any system is the disk subsystem. Unless you are using solid state disks.

NetApp the good the bad and the ugly by Corvar · 1999-11-18 19:06 · Score: 2

Now from all of my research it seemed like NetApp was the way to go. So I pushed and pushed and pushed, and finally we got a F760. (Nothing like going from nothing to the top of the ladder) And now it is 2.5 months into being a NetApp user. Both the 1 and 2 month aniversaries were marked with a MB dieing. I must say it is fast, real fast, but right now the analogy is fast like a race car going towards a wall. Now ease of use, maintainence, etc on the UNIX side has been pretty carefree for me. The NetApp has been very easy to use, easy to monitor, and easy to setup. But the NT department which paid for half of it is hating life. The NetApp's quota system is straight out of unix which is not good for NT, i.e. you are putting quota's on users, groups, or qtree's (Think root level directories which are made in a special way). According to the NT guru's file ownership by individual's in NT is a bad idea, therefore all files are owned by an administrator equivalent. This means you lose user quotas. NT has a different group philosophy than unix (multiple groups can have access to a single file) so I am guessing the group quota's are out as well. Leaving qtree's, which are sort of ugly. Right now our NT people are looking at taking the loss on the NetApp and giving it to UNIX (Fine by me ;) and replacing it with a conventional NT file server. Another downside for the NT side of things is that the NetApp's is configured much like a UNIX box. It uses init and rc files etc etc. Well from NT land there is a carriage return/line feed issue. All of those files have Unix style carriage return/line feeds. I am not sure if they break if you start using dos style but I am leary to find out. Which means the Unix side is resonsible for all configuration of the NetApp. This is both good and bad. They aren't going to break my stuff, but I have to take on additional labour. Note: The hardware failures were quickly resolved by NetApp, but it still sucked hard. The NT quota issues are supposed to be resolved in the next major version of the NetAppOS codenamed Guiness or some such. The NT people IMO haven't fully explored the quota possibilities instead taking the partyline that it's too much work. And it is entirely possible that I have not uncovered all of the problem's and solution's for those problems in the time we have had it.

All we seeem to hear is RAID5 this RAID5 that by dapprman · 1999-11-18 19:08 · Score: 0

Why is everyone soo obsessed with RAID5. It is not the holy grail of disk storage as one or two others have tried to point out but been flamed for. Raid5 offers grat resilience, BUT is not good if performance is also required. Just because your data is striped across multiple volumes to aid recovery, it still only reads from the one volume, and the need to perform the stripping on writing makes the system slower. If performance is an issue, and money is not, then RAID1 (mirroring) is the solution (unless your system will allow both RAID0+1 (IBM RS6000's, my domain, do not)

Re:All we seeem to hear is RAID5 this RAID5 that by Speed+Racer · 1999-11-18 20:04 · Score: 1

Just because your data is striped across multiple volumes to aid recovery, it still only reads from the one volume
That is simply not true. Reads in RAID 5 occur from all volumes where a stripe resides. A file never exists on a single volume in RAID 5 unless it is smaller than the stripe size.

--
Free Mac Mini. Yes, I'm
Re:All we seeem to hear is RAID5 this RAID5 that by Anonymous Coward · 1999-11-18 20:32 · Score: 0
As Usenet News traffic increases, new newsgroups are added, and time before expire increses you need truly awesome storage capacity. for that you need a great many disks which for cost reason will have to be many medium sized rather than a few huge. (So far IBM holds the record with 73 GB for a SCSI disk).
Even with MTBF in the 500000 hours range the probability of failure per year increases to the point you will save a lot of work and frustration on support by going for RAID 5. Customer satisfaction is also a point.
For reasons of reliability you will wish to
- use proven drives (remember the Micropolis problems)
- use disks from different batches (to avoid simultaneous failures due to PROM program bugs)
- use disks of different age (so that not all disks wear out at the same time, defeating the RAID protection)
The experienced news admin will of course recall that the news server (usually, check to be sure) uses overview databases. thesedatabases are 3 - 10 percent of teh size of the entire news spool. These do not have to be on the same directories as the articles. Do take advantage of this and
- keep article news spool on a RAID 5 protected area
- keep your overview database on an insanely fast RAID 0 area
- use different drives with different sizes of
  - fs block
  - RAID stripe
  - prefetch
- best of all use these two arrays on different SCSI host adapters
Summary suggestion:
- RAID 5 article spool of 6 (+1) - 13 (2) disks (hot spare in brackets), block sise at 1 - 16 KB
- RAID 0 overview area of 3 - 5 disks, block size at 16 - 64 KB
Experiement with most effective block size, as this relates to on-drive buffer cache. Note that this is for normal newsgroups. Some, such as the bianries or those with a lot of HTML are much bigger and will need different tuning, perhaps a dedicated array to these is an idea if your customers enjoys alt.binaries.furniture etc.
Other tips, in no particular order:
- profile your system with Bonnie and Iozone
- get extra drives for cold storage, especially since many HW RAID systems don't like to mix different sizes, geometries or even makes
- keep a steady turnover of disks, change them before they get too old
- ventilate well, it is amazing howmany forget this
- remember that disks are cheap, downtime and engineering hours are expensive and data on disks may cost even more
- use a dedicated drives for /tmp swap and /var/tmp
- Check mechanical decoupling of the disk farms, these days people have seen negative effects when taversing arrays and drive arms seek in lockstep patterns.
And most importantly: report back when you get your results. Best of luck.
Re:All we seeem to hear is RAID5 this RAID5 that by WSSA · 1999-11-18 21:41 · Score: 1

Hear the voice of reason! RAID 5 is good for reads but you suffer a big performance hit for writes. _If_ you can guarantee that all writes are the same size (as with some database servers) then you can tune the stripe size, but I don't think you can guarantee this with a mail server.

And when a disk fails you take a performance hit on reads too - better for your stress level to pick another mirrored RAID level that means you don't have to panic quite so much when this happens.

I would recommend RAID 0+1. I have seen decent performance with Online Disksuite (software RAID) - even if you don't want to run this in production it would give you a chance to try things out without spending too much cash.

The news server application mentioned may suit RAID 5 because a) you have to store a massive amount of data, b) most accesses are READs (people browsing news rather than posting).

For email, you are going to get a good mix of reads and writes rather than just reads. In fact I think you'll find the application is attribute intensive than anything.

There's one thing that you can do for free: increase the DNLC (directory name lookup cache). You do this via /etc/system and it helps a lot on systems that are accessing a lot of files (NFS is the classic application for this but I think a mail server will benefit too). Check docs.sun.com for how to do this.

All we seeem to hear is RAID5 this RAID5 that by dapprman · 1999-11-18 19:09 · Score: 1

Why is everyone soo obsessed with RAID5. It is not the holy grail of disk storage as one or two others have tried to point out but been flamed for. Raid5 offers great resilience, BUT is not good if performance is also required. Just because your data is striped across multiple volumes to aid recovery, it still only reads from the one volume, and the need to perform the stripping on writing makes the system slower. If performance is an issue, and money is not, then RAID1 (mirroring) is the solution (unless your system will allow both RAID0+1 (IBM RS6000's, my domain, do not)

reads vs writes by flatrock · 1999-11-18 19:11 · Score: 1

Writes shouldn't take significantly longer than reads. I work with Fibre Channel, and the throughput numbers I get for raw reads and writes (no file systems) aren't significantly different. If you have a good raid controller, it should be able to keep the drives busy on both reads and writes as long as the file system is writing data in large enough blocks.

Re:reads vs writes by GooberToo · 1999-11-18 21:58 · Score: 1

It should be well understood that writes do and always will have a higher overhead than reads. This can not be stressed enough. Writes must have parity information calculated and written. This has a higher cpu load (local or off board) and results in more physical data being written to disk. If you can not measure a difference, it is possible that you are not saturating your disk and/or controller. Likewise, it's also important to stress the importance of RAID controller selection. The actual delta between reads and writes can vary greatly depending on the vendor and implementation details. Also, something that people seem to forget is that if you choose a RAID solution that has onboard cache, make sure it has onboard battery too (diagnostics of onboard battery state is highly recommended). It is also highly recommended to have a UPS too. Without these, verifing the state of the write cache can be questioned in the event of an untimely shutdown. Of course, let's not forget that while multi-channel controllers are good, having multiple mutli-channel controllers are even better from a performance and reliability perspective.

I have use Mylex controllers with 32MB and battery. We found that they worked very well. I would like to point out that there are other good solutions too. Be sure to do you vendor and product homework!

Dell support really sucks by Anonymous Coward · 1999-11-18 19:18 · Score: 0

I'm trying to find out for three days what kind of memory SIMMS are in a Dell Optiplex 466/L. No conclusive results yet. If you don't beleive, give it a try. And then tell us about how good Dell support is. Buying hardware "customized" for Dell is wasted money when you get to upgrades. I have solid experience with this.

Re:Dell support really sucks by Anonymous Coward · 1999-11-18 21:09 · Score: 0

I'm trying to find out for three days what kind of memory SIMMS are in a Dell Optiplex 466/L. No conclusive results yet.

That seems very odd to me. When I was using a Dell Optiplex 486 to demonstrate how good linux was to my boss, I made a mail server out of it to take the load off of an existing NT server. About a month later, the motherboard just up and died.. but we had Dell support on it and they had us a new motherboard the next day. I couldn't beleive it. One would think that they'd be able to tell you what kind of SIMMs it uses with no problem, they knew EVERYTHING about this machine when we called them. Of course, you might be calling someone different.

By the way, we use a Powervault at work (and are about to buy three more) for SQL stuff. That unit is so bad ass that I want to sleep in the server room with it. Getting it set up though was a pain.. Dell hired outside help for the configuration of it and these guys didn't know anything. They basically came in and sat on the line with tech support all day for a simple job.
Re:Dell support really sucks by Dekaner · 1999-11-18 21:28 · Score: 1

Are you kidding me? Not only does Dell's phone support rock, they're online support is even better!!

Here, I found the information for you, and it took me all of about 1 minute.

http://support.dell.com/docs/dt a/4XXLV/00000001.htm
Re:Dell support really sucks by Anonymous Coward · 1999-11-18 23:26 · Score: 0

Their web site works today. But it didn't yesterday. It didn't two days ago. Just stupid Microsoft SQL and HTTPD errors all day long. Who knows if it will work tomorrow? Somehow far from the "best support in the world".

We did this recently by Anonymous Coward · 1999-11-18 19:21 · Score: 0

We started with 2 email servers on 100GB fibre channel RAID 5 disks, and performance was really bad. We traced some of the problem to disk latency, and some of it to the fact that both servers were handling SMTP incoming, SMTP outgoing and IMAP, involving a lot of LDAP and DNS activity. We split the server roles up, assigning one new machine the "corporate" address, for mail coming from the outside world, one new machine the "outgoing" address, to be responsible for deliveries, and the original two machines became IMAP farms, receiving only the email bound for their users, and not dealing with any outgoing email.

That did help delivery time, but responsiveness was still bad. The next step was to break our RAID5 volume into small mirror sets (RAID0+1). With Email you're constantly writing and reading small files, too many to cache effectively. The head contention involved in random writes and reads was killing our performance, so to minimize it we built the smallest drive arrays possible (one disk big, two disks deep for reliability). This has worked pretty well for us.

In general I'd stay away from any IDE or EIDE disks on a disk-bound server. In addition to the points others have raised about SCSI reliability, the SCSI overlapped IO model and the ability to run more channels allows you to attach and access a larger number of disks. Fibre Channel will get your transfer rates up through the roof, making the SCSI bus speed not-a-problem.

Now if Mozilla will just come out with a faster client....

Software RAID is NOT faster by dreamchaser · 1999-11-18 19:21 · Score: 1

>1) Software RAID is usually a lot faster than >hardware RAID. And for the money you save on the >HW controller you could buy faster/more
>disks.

Since when? I've been working on servers with and without RAID for ten years now, and this is the first time I've EVER seen this claim. Was that a typo? Hardware RAID is much faster usually, as well as more reliable. Yes, it can be harder to set up, but in the end it is well worth it. Remember, you get what you pay for. Any time you use software to do a job that hardware can handle, you are devoting CPU cycles to it. Properly designed RAID controllers offset a ton of processing that would otherwise be done by the host CPU. They don't put RISC processors on RAID controllers just for show :-)

As for SCSI controllers, I'll echo what others here have said. Mylex is one of the best. Not the easiest to config, but by far one of the fastest and most reliable controllers out there.

Re:Software RAID is NOT faster by Oestergaard · 1999-11-18 20:00 · Score: 1

Remember ``Hardware RAID'' is just a smaller processor running software as well. The PII+ in most modern systems is way faster than the i960 or m68X in a hardware RAID controller.

I've seen quite a few people finding in disbelief that they surely didn't get what they thought they paid for when buying HW RAID solutions.

Back in the old days I'm sure letting an i960 do parity calculations was a boost. Well, times change.

The _only_ thing I've seen HW raid controllers being better at, is large setups (10+ disks) where a pure SW solution will load the memory and PCI busses of the system heavily. Especially RAID-1 where a SW solution will have to duplicate data to all disks, the HW solution will have an edge moving this duplication off the main memory / PCI bus.

For smaller setups, like the one in question here, software RAID is absolutely both a viable solution, and probably offers by far the best price/performance.
Re:Software RAID is NOT faster by jdz · 1999-11-19 02:37 · Score: 1

"Back in the old days" many array controllers used i960s, m68ks, etc. These days, most array controllers are using GP processors like Pentiums.
The big advantage of the hardware solution is encapsulation and management. Companies like LSI, Clariion, etc, sell the idea that they give the customer a big black box, and they "make it go." EMC takes this to the logical extreme by providing tremendous support at incredible premiums.
Pretty much, it boils down to a time/money tradeoff. Most of the software RAID solutions require more hands-on administration, but lower costs.
I should, in fairness, note the other hidden advantage of hardware solutions- extra redundancy. Not of data components (disks), but of infrastructure components. Many "highly reliable" RAID cabinets provide features like redundant fans, power supplies, even CPUs.

HP all the way by Anonymous Coward · 1999-11-18 19:42 · Score: 0

I recently had to implement a new OpenView server. Our standard is HP-UX. After alot of looking around, I got a HP AutoRaid 12H (2 96MB Raid Controllers, 12 half-height bays, 3 hot swap fans, 3 hot swap powerr supplies) connected to a HP R390 (capable of dual 360 Mhz PA-RISC, max of 3 GB RAM, up to 2 9GB internal drives, Gigabit ethernet, free DVD-Rom and web remote console) retails for between $40 - $80 grand depending on how far you go with it.

Your machine should hand several times that volume by Anonymous Coward · 1999-11-18 19:52 · Score: 0

I run a mail server with several hundred thousand users. What you need to do to reduce disk io is 2 things. Spool: This is a killer, but here is how you fix it. Use the re-mqueue script that comes with sendmail and configure 3 or 4 cron jobs. Copy all mail that is older than 1 hour out of your mqueue to a subdirectory like */mqueue/.1/. The second cron copies anything older than 4 hours out of .1 to .2. Third cronjob copies jobs out of .2 to .3 older than 1 day etc etc. Then set up a cron job to run sendmail in queue=process mode and use the -O QueueDir to the mq directory */.1/ through */.3/. Run the 1 directory once every 10 minutes, run the 2 once every hour, run the 3 once every few hours. Then to keep your master mqueue dir small run your normal sendmail to with -bd -q1m and process the queue once a minute. For your mail boxes themselves, especially if you are using bsd mbox's, use raid 0+1. If you can afforc a good fiber disk subsystem get one, but an A5200 with a few drives is like 60 grand!. On an ultrascsi wide bus try using 4 9gb cheeta's as raid 0+1, gives you about 18g usable and will fix your io problem. Much more traffic than that on one ultra-scsi bus will max it out and will require more controllers. Veritas will help ALOT as well. Much better than the stock solaris crap. Best,

Load Ave 10 does not mean an IO Bottleneck. by Anonymous Coward · 1999-11-18 19:54 · Score: 0

Quite right here - The load average IS the number of processes in the run queue (runnable means NOT waiting for I/O) so 10 processes is quite a lot (unless you have 10 cpu's, like me). Processes waiting on network or other I/O don't add to the load average.

I haven't used exim much, but what are the processes doing? Are they all of one type e.g. smtp delivery, local delivery, smtp reception, pop handling? Can you streamline the process table a bit (e.g. limit simultaneous deliveries?)

What version of solaris do you have? 2.5.1 wasn't tuned for as much as 768M memory by default, so you need to raise 'minfree' in the kernal, to prevent a burst of process forks from eating all the free memory and causing a lot of demand paging (to free up pages for the impending 'cow' mappings (copy on write for new data)).

2.6+ is better with large memory systems - (virtual adrian will adjust these tuning settings for you...)

Split your swap between 2 or more (physical) disks if you are paging heavily (remember, vmstat's 'paging' figures are misleading - 'page outs' INCLUDES counts for ALL data written out to disk (i.e. file I/O) - if you are writing a lot of data to disk, you should expect/want to see a HIGH value for page outs. swap -l will give you a good idea of whether you are realy using your swap...

RAID5 (at least in software) will drastically slow your write activity and generally be a big win for read activity. If you don't need fault tolerance, just use disk striping (Raid3?) for big win on reads AND writes.

Don't put 2 file systems on the same physical set of disks unless one of them has very very low activity.

Check if exim is spooling big mailboxes into temporary files in '/tmp' - this is a ramdisk by default on solaris 2.6+, so if a user is manipulating a 500MB pop3 box via /tmp, they have just swallowed 2/3rds of your memory space, and pushed all your running processes out to swap land - kernal time spend paging means moer processes waiting on the run-Q, so pushing the load average up...

Fibre channel disk arrays (ala sparc storage array) with a write accelerator cache, volume management software for striping and mirroring (e.g. veritas) and a journaled file system (e.g. vfs, but NOT journaled ufs) provides a really industrial strength and speed storage system (but you'll pay industrial strength prices!)

Have Fun

You've bought the hype by spacey · 1999-11-18 19:56 · Score: 1

SAN is an ill-defined acronym that everyh vendor defines differently. The idea selling SAN is that you have a large centralized storage center that offers it's disks/volumes to all connected clients w/o the hassle of administrating a disk subsystem on each server.

The problem is that each vendor implements this differently, and has a different definition of what a SAN should be. None have really addressed the complex issues, instead implementing the kind of hack you describe - NFS with a data channel over FCAL. You still have the problems of NFS to contend with (no reliable locking, consistant transactional guarantees in client and server implementations, etc.). Heck, most vendors are selling FCAL HUBS instead of SWITCHES to accomplish this storage sharing because the switches aren't prepared to do TCP/IP over fiber!

Ideally a SAN would be a well fleshed-out spec that allows massive amounts of storage to be conveniently accessed accross a network with all of the guarantees of a local disk. That's how it's being sold. However, right now it's looking like little more then a way to get NFS to run faster.

-Peter

--
== Just my opinion(s)

Re:You've bought the hype by Matthew+O'Keefe · 1999-11-18 20:39 · Score: 1

Hi, Peter is right about SANs being overhyped in the past. However, the hacks for shared disk access he describes are now being replaced by real shared disk file systems like GFS (Linux -- http://www.globalfilesystem.org ), CXFS (SGI), and CFS (Compaq's Cluster File System for Digital UNIX). These file systems allow a SAN network with a homogeneous OS to share disks as if they were local. The other recent development with SANs is that they are starting to get cheaper (FC adapter cards less than $300, FC hubs less than $100 per port) and the drivers have become reliable and reasonably robust. SANs can be constructed from RAID disks, of course, and as your disk access or capacity requirements increase the idea is you can just keep adding new disks and/or machines to your network. My group at the University of Minnesota has written GFS with the help of several others in the Linux community, and with alot of feedback from Linux kernel hacking luminaries like Stephen Tweedie. Also GFS is currently the only GPL'ed shared disk file system availble for Linux. Here are a few details about it: GFS runs on Linux 2.2 kernels and allows Linux machines to share storage devices on a network. The network medium can be parallel SCSI, Fibre Channel, or whatever. GFS machines maintain locks around filesystem metadata operations to insure that only one machine at a time is modifying metadata: the metadata itself is distributed and the locking designed to reduce overhead so that high scalability (adding more clients) is possible -- we've achieved good performance with 8 machines connected across a FIbre Channel fabric to 8 JBODs with 4 Seagate FC drives each. We think our scalability is even higher, but don't have the hardware just yet to test out that theory :-) Distributed journaling (i.e., if one machine in the GFS SAN cluster fails other machine recover its journal to keep the file system metadata consistent) is now being implemented in GFS, and we expect to have that ready by early 2000. So what do shared disks give you? Better availability since if one machine fails the others can pick up the load. You can add disks, machines, or additional Fibre Channel switches to scale up your system. GFS includes a volume manager called pool that helps organize the network disks, and we are working on a mirroring implementation for this volume driver. We are also working on on-line resizing so that as you add more hardware to your SAN the Linux machines can simply slurp it into their existing file system organizations transparently. Finally, it gives you local disk performance with all the advantages of sharing and accessibility that network file systems provide. If this sounds interesting, check out our web site at http://www.globalfilesystem.org Matthew O'Keefe The GFS Group
Re:You've bought the hype by egghat · 1999-11-21 18:28 · Score: 1

Can you provide us with links to hardware vendors, which sell FC cards for 300$? I checked the lists on your webserver, but didn't find anything in that price region with Linux support.

TIA.

egghat.

--
-- "As a human being I claim the right to be widely inconsistent", John Peel

Look at the FS first! by spacey · 1999-11-18 20:01 · Score: 1

Solaris' filesystem prior to the logging filesystem in 2.7 is a dog. I'd highly recommend that you benchmark your performance w/ Veritas' vxfs, or w/ solaris 7 before you buy a raid system.

Also, if you do get a RAID, I'd highly recommend a box that does not get controlled in software, i.e. Solstice DiskSuite or Veritas Volume manager (I love veritas' VM, but as a raid controller it lacks intellegence).

A good external box with hot-swappable drives and a sizeable write-back cache (w/ a battery!) is my favorite way to do this stuff.

--
== Just my opinion(s)

Clariion Rulz by Anonymous Coward · 1999-11-18 20:04 · Score: 0

We have nearly a terrabyte of Clariion Raids spread amongst 4 different SGI Origin 2000 servers.

For some reason Clariion is the only RAID's SGI will slap their name on

We beat the hell out of those things. In the course of two years we have had one disk failure. The sysadmin took the disk out and simply put the same disk back in! (gave him heck for that as he had spares on-hand). OTOH it has worked fine for the last year.......

Re:Clariion Rulz by grumpy_geek · 1999-11-18 21:47 · Score: 1

From a long ago research on that SGI bit, SGI does a bit of a tweak on the Clariion drives, . I got this info from Alexis Cousin (sp?) from SGI's Europe office, when I was trying to find a HA solution for my SGI's a year or two ago. Supposedly the box won't detect a failure in a drive path and failover to the second controller unless you are using their OEM'd Clariion.

SGI loves to mess with stuff and OEM it, they dinked with the Netscape Servers, dinked with Clarrion, they try to do it whenever possible. I talked with a Veritas guy at a Lisa convention and he said SGI talked with them when Veritas was first starting, but would only except an OEM version of it. Supposedly it really pissed off the Veritas head guy so much, that the Sales guy said we would probably never have a solution from them for SGI... of course I now have heard rumblings of SGI & Veritas doing some colaborating these days on some Linux devel; so I guess all bets are off anymore.

The trouble with NetApp by Electra · 1999-11-18 20:04 · Score: 2

I work for a Systems Integrator-nice word for RESELLER! We are a Sun reseller first and formeost, but we are very strong in the NetApp arena. Since I am a geek trapped in the hell of being a sales(wo)man, please forgive me if I sound salesy at all....
Anyway, NetApp's are a great solution for multiprotocol storage. One of the drawbacks is that it is Network attached and therefore only as fast as your network...which has been a problem for many of our customers. Another HUGE problem is backup. There is only one product that can do it well-a product called BudTool. BudTool is a little guy that some geeks in my company thought up and brought to market, then along came NetApp who asked us to figure out a way to b/u their filers. Out of that venture NDMP was born. BudTool is the only product that makes use of NDMP correctly. That divison of my company was recently sold to Legato systems, who plans to EOL that product. NetApp is now scrambling to find another solution, since they've been recommeding BudTool from Jump Street....
Pricing is also an issue. And you were right in saying that they start at aroung $17K, but that is WITHOUT storage. A good sized storage solution, let's say 1 TB is going to run you upwards of $100K. Yikes.
There is also a good resource for people who are thinking of deploying a NetApp solution, which is the toasters users group. You can send an e mail to toasters@mathworks.com and ask to subsrcibe to the group. You'll get alot of good feed back on what works, and what doesn't. You'll also get to see the downside to using it (and BudTool). I think there is info about the group at http://teaparty.mathworks.com but i haven't been able to get there in a few.....Check it out. It's definitely worth the trip.
And if you need any quotes I'd love to help you out!!! Just Joking

--
"Most of my heros won't appear on no stamps..." Chuck D from Fight the Power

Re:The trouble with NetApp by Anonymous Coward · 1999-11-19 02:41 · Score: 0

Veritas has an NDMP tool that works as well as the Legato solution. It's just a bitch to set up.

The real problem with Network Appliance, though, is their business philosophy. Their approach has been to make a cool "widget" and convince you that you need their widget. They know what's best for you (and your customer, if you happen to have a contract with them to OEM it), and you should not ask for different features.

Managing anything on a filer is a pain. Their stupid little Java-based GUI thing is actually more confusing to most people than hacking on the command-line. In a pure NFS environment, or a CIFS workgroup-mode environment, simple maintenance like changing passwords requires a separate machine.

The OS itself, though, is the scariest part. It is a monolithic piece of software--no modules. This means that every time they change ANYTHING the entire release must be re-validated. The Chinese Butterfly Syndrome has reared its ugly head on more than one occasion.

Likewise, there is little or no systems management implemented. SNMP only has four traps, I believe. Everything is geared around automatic e-mail, which is okay, but not terribly helpful if the rest of your data center is easily managed.

Lastly in my little rant, installing the OS is a royal pain, also. This is a nominal $100K box which doesn't have its own CDROM, and has to be booted off of floppies so that you can map to it and copy its own permanent operating system to its harddrives from another machine.

I won't even get started on bugs they choose not to fix, but prefer instead to wait until you call and complain and then they'll send you an un-tested release with a patch for that particular problem.

Yes, I'm posting anonymously. Duh.
Re:The trouble with NetApp by travisd · 1999-11-19 08:39 · Score: 1

Legato actually has a Networker module out for teh Filers now.

raid 3 vs raid 5 by Anonymous Coward · 1999-11-18 20:06 · Score: 0

Whats the trade off with raid 3 vs raid 5.

Some docs I have seen say raid 3 is faster, any tradeoff in reliability?????????

Re:raid 3 vs raid 5 by Anonymous Coward · 1999-11-18 21:50 · Score: 0

Raid Level 3 is better for large continues writes. Like video files. Where Raid 5 is better at smaller block sized writes. 0 + 1 is best for maximum speed. The fastest solution I have seen is software striping at the driver level onto a hardware raid system at level 3 with dual controlers. LizardBoy

SCSI/IDE RAID systems by julest · 1999-11-18 20:14 · Score: 1

We've been running a SCSI/IDE RAID system here for some months now. They're actually a pretty decent idea - the array presents itself as a Wide-SCSI device, but drives 6 IDE HDD's over 3 IDE Busses. There's 128Mb of cache in the box, too, so it feels pretty snappy (although I've got no hard figures on it's performance).

The real bonus, of course, is that it's dead cheap, compared to equivalent all-SCSI solutions.

I should probably say that we've only got it running on an NT file server at the moment, so I can't vouch for it's performance on a big scary mail server, but it's working well for us. Certainly, it seems to deal OK with everything we want from it (RAID-5, plus a hot-spare). It deals just fine with you disconnecting a drive while it's running, and simply gets on with re-building onto the hot spare (hardly a scientific measure of it's usefulness I know, but certainly handy for demonstrating to PHB's why they should like it *grin*).

Re:SCSI/IDE RAID systems by Anonymous Coward · 1999-11-18 21:49 · Score: 0

What is the cost per MB of this solution? It doesn't sound so cheap when you are talking about comparing a proprietary cached solution to a basic SCSI sub-system (which would cost about $1100 for a 27GB U2W RAID setup with 6 spindles running at 7200RPM).

Completely FALSE, LIES ALL LIE by grumpy_geek · 1999-11-18 20:15 · Score: 1

Sorry if I offend but damn it I hate when people get fed misinformation... and I am grumpy_geek

Hardware is MUCH, MUCH faster than software, we've got boxes here with 2 gig of cache in the raid controller because we can't spin the disks fast enough (of course we've got terabytes of data). Hardware will allways be faster than software for the sole fact of cacheing, you may never need it, you may not do enough I/O to have to wait on disks, but just because you don't use it doesn't mean it's slower.

SCSI disks vs. IDE you really don't know what you are talking about do you. How many simultaneous I/O operations can you do on IDE???? IDE you do each operation in a serial fashion, means one I/O op holds up the rest, not a big deal for a workstation but for any multi-user situation (or better defined multiple simultaneous I/O ops) SCSI is required. Warranty? I don't even want to think how you came up with that one, or why you would think it even applies. I'll add my own one here, and this is a biggee... I don't know of any IDE HA solutions, I guess that would be because you can't share drives with IDE. Of course there is hardly any difference between SCSI & IDE.

People can get 15mb off of one drive doing large writes (writing a single 5 gig file), but you will NEVER get that performance using one drive on any type of random access information. Did you really think about what you were saying about putting in a controller PER DRIVE for the UDMA... so to get the same performance as 7 SCSI drives one a contoller, I have to add 7 additional controllers into the picture... how many open expansion slots do you have in your box today?

Re:Completely FALSE, LIES ALL LIE by Oestergaard · 1999-11-18 20:21 · Score: 1

This guy doesn't have terabytes of data. I'm sure you have some very nice hardware and I'm sure your solution is the best one possible. But please, we're talking mailboxes here, and this guy needs something faster and eventually safer than the _single_ disk he's using now.

About simultaneous operations... Well, why do we put a kernel underneath the applications ? IDE performs well in the real world. Besides, how many places can the heads in your SCSI drives be in simultaneously ? I would guess, just as many places as the heads can be in, in IDE drives. One.

And yes, the major problem with IDE is that you need one controller for every two disks, to keep the performance good. That's a problem which SCSI doesn't have to the same extent (you'll need one controller for every 6 drives or so, on U2W with fast drives). But considering that this is a small-scale setup, it's outright stupid to outrule IDE because you can't put 1000 disks in the system.
Re:Completely FALSE, LIES ALL LIE by grumpy_geek · 1999-11-18 21:12 · Score: 1

Yes I do have terabytes (ok, I don't my corp does, which I'm a systems analyst).. We've got 12 SGI O2k's on the floor using FC to EMC storage, along with some scattered Sun 4500's, 450's, Sequent, and some Dec Alphas. You can go back through my previous posts on other topics and find proof behind these statements if you wish.

I didn't say much at all about your short little suggestion at the end (IDE yuck), but you said

1. Hardware Raid is slower than Software
2. IDE is identical to SCSI
3. UDMA is as fast as ANY SCSI SOLUTION (possible but the solution is extremely silly, buy a drive buy a controller)

Those sure the hell don't sound like solutions they sure appear to be you stating them as facts, and those facts are FALSE. Give me hard evidence contrary.

IDE performs well on the real world desktop not on servers. Sure the head of that one drive may be only able to be at one spot at any point in time; but SCSI can access all the drives on the same bus at the same time... I've got 2 drives on the same contoller, I can access both of the drives at the same time, effectively doubling what I can do... IDE I have to wait for the operation to end on drive one before I can do anything with drive two.

Usage patterns for MTA systems by A+Masquerade · 1999-11-18 20:18 · Score: 1

All MTAs that are halfway reliable are disk bound (*not* network bound) - I believe that Wietse has some information on this in the postfix data.
This is because each message is fully commited to disk as it comes in (for exim this means opening (creating) writing, closing and flushing 2 files, other MTAs differ slightly), and then a reliable local delivery costs about the same.

Hence what you need to optomise is the latency of synchronous operations. So I would strongly recommend some form of RAID with NVRAM cache which means the commit time is memory speed rather than disk seek related.

Adaptec Ultra 2 Raid controller by Anonymous Coward · 1999-11-18 20:26 · Score: 0

If you have the money, might I suggest Adaptec's AAA-133U2. I have worked with Mylex 960's and they are pretty solid too, but I still prefer adaptec. The AAA-133U2 has 3 channels (the AAA®-131U2 only has one channel) and just think of the speed ;)

Other possibilities for performance improvement by James+Youngman · 1999-11-18 20:27 · Score: 1

There are some other possibilities to consider; for example, do your queues contain many deferred items (due to note unreachability or DNS problems)? If so, you are trawling through a large mail queue to process a small number of items (since some of the items are not due to be reried yet [I'm no exim expert]).

If you're using a lot of disk in re-reading the queue, consider using an MTA which has a separate queue for deferred items, and/or a hashed directory structure. The Postfix mailer (by Wietse Venema) fits the bill here. Postfix is particularly good for large queues.

Postfix also is deliberately written to make filesystem accesses an absolute minimum of times for each item of mail (I think you can have as few as 3 disk accesses per item). This really reduces disk loading, especially on systems with synchronous filesystems.

On the RAID side, consider alternatives to RAID5. RAID0+1, for example, is as safe as RAID5, but faster (though it uses slightly more disk drives).

What is the balance between writes and reads on your mail server?

Are you logging syslog locally on the mail server? If so, consider either moving syslog logging to a dedicated log box. If you can't do that, consider using the leading-dash feature in /etc/syslog.conf to make syslogd avoid calling fsync() on the mail log for every single logged message.

Raid & NFS Systems for Sun Sparc & ISP's by cybrthng · 1999-11-18 20:35 · Score: 2

I've been in the ISP business for years. Ran an ISP with 2000 customers and was the Systems Admin for an ISP with 150,000 customers.

Reliability Is the issue when it comes to email, and raid systems. Ofcourse Sun has the edge, so why not stick with Sun Software & hardware. The sun StorEdge A1000 has a caching controller and usually 30-40 gigs per rack, it plugs into your SCSI Bus, and you can simply add another Dual Channel Scsi card to split the load or add redudancy.

Network Appliances makes an Excellent Solution. NFS Toasters are the way to go in a distributed environment. Say you have customer on a shell account, well you can export the mail directory and mount it VIA NFS and access it from the shell servers without throwing more email load on them locally. NFS Toasters come in a great looking appliance rackmount case, and depending on how much storage you need, is how much rackspace you need.

And ofcourse there is StorageTek, which will run you a pretty penny, but offers Fibre Channel, or Multiple SCSI channel connections, full redundancy, caching, hotswap and maintenance features.

I'd never stick and IDE solution on a production box, You need something that you can get support on and Services on, so i'd suggest that you stick with the Sun StorEdge A1000 drive systems for complete compatibility and put it under the same Support contract as your UltraSparcl

AND

As far as email is concerned, you should setup an MX server to cache and forward incoming email, these work real nice since you can run RBL or pre-process out spam without killing the actuall server that holds and processes email for incoming clients. You have to look at a distributed environment, as email is precious to alot of people, and a single server machine is not gonna cut it when your upwards to 20,000 customers doing that much email.

PS. Try out Qmail too :) smaller footprint!

iostat or SE, not vmstat & references by Anonymous Coward · 1999-11-18 20:36 · Score: 0

vmstat is mostly useful for seeing CPU bottlenecks and RAM shortages. Use the SE toolkit, or "iostat -xc 10" for disk bottlenecks. Also, go out *right now* and buy "Sun Performance and Tuning" by Adrian Cockcroft, and "Configuration and Capacity Planning for Solaris Servers" by Brian Wong. My inclination on this would be to use Veritas vxfs filesystem and software RAID 0+1, striped within and mirrored across controllers, if it's really a filesystem I/O bottleneck. Heck, if it's just a read bottleneck on a few hot files you might be able to just throw RAM at it...

Network Attached Storage by Anonymous Coward · 1999-11-18 20:42 · Score: 0

Those things are NAS, and there is a whole market for them. NetApp is not the only provider (only the best marketeer) of such solution. Out of the top of my head, there are others: - Auspex Systems - EMC - Hitachi Data Systems - certainly others not so famous We are currently in the middle of choosing between Auspex, NetApps and HDS. Anyone can provide us with first hand experience?

Re:Network Attached Storage by Anonymous Coward · 1999-11-18 23:58 · Score: 0

From my experience, NetApps "look" sexy, that's all. Great marketing but lousy product.

It is great until you have a problem. It is a
microkernel and you cannot be be doing much
except serving files. Now the price you pay for that is a little too high.

WAFL performs well for data writes because it
writes anywhere the disk heads are currently located. This means that your data is heavily
fragmented.

When the file system tends to get full, this is when the filer "crawls" because now WAFL is busy
looking for free blocks to write.
This is when trouble arises. Call NetApp support
and two people will call you after 3 days saying
I'm sure your problems are solved and I can close
your call. Supports SUCKS bigtime !!

They have a fsck equivalent called wack which you
can use to get your data in contiguous blocks. This takes hours to run and by the way since it is a microkernel, the filer won't be performing well.
I was actually told to take the filer offline and
run wack.
I've used their F530 filer only. I've heard that
much hasn't changed in the newer boxes.
NetApp is more like Microsoft, looks sexy and
everyone is using it....oh, by the way. it crashes often too. This is when they try to sell you
their Cluster....and other data mirroring software.

I've used the NS7K Auspex. Great product with one
drawback. Very Expensive. haven't looked at their
new 2K series yet. I believe they are cheaper due
to NetApp. Their approach of having a UNIX backend
and micokernels in the front-end is great because
admin functions work well without affecting the performace of the box.
The other thing that I didn't like about the Auspex was that it take a long time to boot.
They have very good support. They came and did
all our upgrades. You get what you pay for.

We had a Auspex 7K, when the NetApp marketing
campaign forced us to buy a F530. This box
has been full of grief ever since. The auspex
on the other hand has been running fine with
regular annual upgrades for the past three years.

never used an EMC.

Go with a professional solution by hey! · 1999-11-18 20:53 · Score: 2

My guess is that in this role, performance is not the paramount issue. You're not bopping the heads around like you would in a database application; and even 20MB/sec is going to be a plenty of throughput unless you have banks of ADSL lines. The important issues are reliability and maintainability.

I'm as much of a tinkerer as anybody; for my own use I don't mind spending two bucks of labor to svae one buck of investment, because I'm really investing in myself. That said, if I had 13K users depending on me for e-mail, I wouldn't mess around; two days of down time could be fatal for your business.

I'd invest $1.50-$2.00/user in a professional grade solution:

Hardware SCSI raid controller.
Drives on hot swap trays.
Same/next day on-site service contract.
External cabinet that can be swapped over to another computer.

It's been over two years since I spec'd a solution like this one (I'm doing software exclusively these days), so I can't make a specific recommendation for today's hardware. I know that some devices used to come in a separate cabinet and looked like a humungous SCSI drive; they even had their own RJ-11 to hook up to a phone line for remote diagnostics from the vendor's tech support.

If the money to swing this is impossible, then I'd recommend mirroring rather than RAID 5. All these kinds of things are compromises between reliability, cost, convenience and performance. RAID 5 is an excellent overall solution from a performance standpoint; but if you cannot afford this RAID 1 is a good choice. It offers fast reads at the cost of slow writes and survival from failure on either disk. In this application, users won't be affected by slightly slower write times. Since drives are so incredibly cheap these days, I'd say this is a pretty good choice if you are strapped for cash. You could even use IDE drives. If you could afford a second IDE controller, then you could use software mirroring across two different controllers for improved throughput.

One thing I haven't looked into is RAID-2; RAID-2 is like RAID-1 with additional error correction codes. It is seldom used in SCSI because SCSI does this for you, but it might be worth looking into for IDE raids.

Good luck.

Really what would be great is failover clustering.

--
Post may contain irony: discontinue use if experiencing mood swings, nausea or elevated blood pressure.

Some observations... by YuppieScum · 1999-11-18 20:55 · Score: 2

Much of this is probably repeated elsewhere, and much is common sense, but...

1. When was the last time you defragged the drives? Chances are this will reduce thrashing immediately.
2. Add more memory. More cache == less I/O. Double the RAM for a week and see how much better things are...
3. Hardware RAID is the only RAID. In most cases, the overhead of s/w RAID exceedes the I/O performance increase. Plus, the OS (whatever OS) need never know the boot drive is spread across 5 drives is three racks...
4. Hot Swap is a must for a production environment. Nothing beats the warm feeling of yanking a dead drive, slapping in a new one, and watching it get rebuilt on the fly - and the users never know...
5. Any amount of RAID will still fail badly if the PSU dies - always get redundant, hot swap power supplies.
6. The same goes for cabling.

--
This sig left unintentionally blank.

Re:Some observations... by Anonymous Coward · 1999-11-20 20:58 · Score: 0

1. When was the last time you defragged the drives? Chances are this will reduce thrashing immediately.
This is the UNIX world where the fs will automatically perform some actions to avoid fragmentation. The so-called defragmentation is not necessary, unless you are running NT (I hope you don't).

Check the software! by sommerfeld · 1999-11-18 21:02 · Score: 1

You didn't mention what software you're using for this mail server.

At least some POP servers are reputed to do stupid stuff like copying a user's whole mailbox to a new file every time a user connects up, looks at headers, or deletes a message. While I don't have specific recommendations, I'd advise auditioning a few different packages to see what kind of I/O load they place on a disk farm. Also, you may be better off spreading your load across multiple (cheap) servers rather than putting all your eggs in one expensive basket.

Also, improperly tuned RAID-5-based systems can be slower than the disks they're built out of because of the need to do read-modify-write cycles to update the parity blocks..

FreeBSD HW RAID? by WhatPong · 1999-11-18 21:06 · Score: 1

Does anyone know of a FreeBSD HW RAID controller. I can't find clear statements about which cards are supported. I guess I may use Vinum. Can someone help?

Re:FreeBSD HW RAID? by Anonymous Coward · 1999-11-19 11:10 · Score: 0

I am the system / network admin for a large ISP and we use only FreeBSD. So the only Raid solution I use is DPT. http://www.dpt.com I have seen posts here saying that FreeBSD works with Raid 0+1 , it does not. I spent many hours on the phone with DPT and they blame it on FreeBSD. I hope someone here can tell me otherwise, and show me also.

RAID Solution by Anonymous Coward · 1999-11-18 21:10 · Score: 0

www.vikesfan.com's e-mail server is a home-built RAID setup. We used SVEC rack-mount cases which are really great. The controller is a DPT SmartRAID V, which uses the i960 chip and can take up to 128MB (?) of cache. We put 6 18Gb LVD 10krpm seagate drives into the chassis. We added a second channel to the RAID controller and ran the 1st 3 drives off of the 1st channel, and the 2nd 3 drives off the 2nd channel to ensure ultimate performance. Then we setup the 1st 3 drives as a RAID 0 (stripe, no parity) and the 2nd 3 drives as a RAID 0 and then setup a mirror set between the two RAID 0's. If you need speed, RAID 5 is a bad idea. Doing straight RAID 0 offers great speed, but no reliability. Doing a RAID 10 solution (striping and mirroring) offers the best read throughput because the controller can read off of either stripe set. Our setup gives us 50Gb of ultra fast LVD RAID, and we can lose up to 3 drives without incurring downtime or having to restore a backup. Vikesfan.com staff

External SCSI-SCSI Raid is best by ZESTA · 1999-11-18 21:21 · Score: 1

My choice for a large mail server (30,000+ users) is CMD 5640 Dual, Hotswap RAID controllers with 256MB cache each, in an external cabinet, with many drives. The controller has 2 host channels and 2 drive channels. I also use Kingston DE300 hotswap trays, which let you put 3 1-inch drives in the space of 2 devices. I would go 6 drives and put 3 on each drive channel. Depending on how much space you need, you can use 9 or 18GB drives. One nice thing about the controller, is that it is separate from the system, so if the system crashes, you still have access to the RAID controller to troubleshoot problems. The controllers have serial console ports so you can access them the same way you would a headless sparc server. The controllers aren't cheap (~$6600), but well worth it. The Ultra2 I have with dual 400's, 2GB ram, and this setup with 50GB drives should easily handle 100,000 users. If the price is too high, you can get non-redundant controllers (CMD 54xx series) for alot less. (~$2200 with 32MB cache)

Another thing to think about is what software you are running. I was running qpopper on a server that continually had a load of 15+. after switching to cucipop, the load went to 0.15.

feel free to mail me if you have questions or want more info... I can hook you up with the vendor I order from.

-Randy

IDE=BAD by Null_Packet · 1999-11-18 21:32 · Score: 1

(1) SCSI - EIDE - _BAD_IDEA_. I'm not quite sure if you're familiar with SCSI and IDE's physical performance attrobutes, but if you are experiencing any bottleneck issues whatsoever with SCSI, IDE, even EIDE is possibly the worst thing to do in this situation.

(2) You ought to make the point that you're looking for Sun-box stuff, which is *way* more confined than PC RAID. We are running Exchange (no flames) and we use mirrored RAID 5 on two seperae controllers- we can have any two drives fail simultaneously with no repercussions. My point is that the PC RAID market seems to have far more choices for you.

3. RAID on IDE works fine, even great for a desktop user who pushes the perfomance envelope- but (E)IDE cannot and will likely never compare to UW-SCSI or U2W-SCSI.

NP

Poor Man's Raid by IQ · 1999-11-18 21:55 · Score: 1

Not raid at all but just an online backup:

dd if=/dev/sda of=/dev/sdb bs=1048576 >> $LOGFILE

This assumes identical geometries. So buy 2 drives instead of 1. Use it once a week or every night. This has saved my ass countless times. Every box I build gets a dupdrive script containing the dd command above and a spare drive.

--
Adults are obsolete children. - Dr. Seuss

Don't underestimate the filesystem by DanIncognito · 1999-11-18 22:10 · Score: 1

As lots of people have said, the disks and raid setup can be a problem. Spend some time with vmstat and iostat and determine where the bottleneck is. If you have a throughput problem, you want more controllers in the mix. If your spindle bound, you want more disk. However, I didn't see mention of what type of filesystem your using. I imagine with a mail server that you have thouosands of tiny little files spread across only a few directories. For that situation, it's rather critical to put a filesystem that does binary lookups of your metadata (Such as Veritas). use vsar to look up your inode hits and misses and if the ratio is out of whack, try to break things down to fewer files.

The SW RAID tech TODAY by Anonymous Coward · 1999-11-18 22:19 · Score: 0

Really, incompetent sysadmins can kill of any system and that is also my bitter experience.

Secondly it seem you are somewhat behind the curve here, the latest Linux SW RAID is capable of autodetection and I'd expect commercial alternatives to offer at least that. Today, SW RAID does handle the problem you mention and can handle people moving disks and changing SCSI ID numbers.

And tell me: what is simpler, one sysadmin fixing it all or one SW sysadmin and another HW sys admin calling each other? Partitioning off for some weird principle rather than efficiency sounds like the works of pointed haired bosses.

Raid by trippd6 · 1999-11-18 22:23 · Score: 1

We run RAID on over 80 developemnt servers and 20 production servers. We run NT and MS SQL 7.0, but also do things like bill generation which involves alot of raw file access. Currenty our best setup runs like this (We use HP Intel hardware):

We have 2 raid controllers (each has 3 channels, but you won't need that much for your setup) running a RAID 10 arrray on each. RAID 10 is about the best performace you can get out of RAID. Basicly the idea here is that most RAID controlers are A) Slow B) Can only handle so many I/0's per second, and thats always slower then a modern system can handle.

If you don't know what RAID 10 is, basicly you have 2 or more mirrored drive sets. Then you stripe across those drive sets. This means you A) Need atleast 4 drives B) You lose 1/2 of your useable drive space in the mirroring. But this also means you can do 2 seperate reads across 2 different sets of striped disks which is very speedy (In theory, anyway)

So if you spread your spool and mail across 2 raid controlers running RAID 10, thats probably the best performance your going to get. 10,000 RPM drives will help alot too. The only problem, this is also the most expensive way to do it.... ohh well....

-Tripp

Re:Sun: the A1000 are Metastor OEM by Anonymous Coward · 1999-11-18 22:26 · Score: 0

Cost saving tip - the A1000 are Metastor OEM. I buy a 10 drive unit that is very similar to the A1000 direct from Metastor with a newer version of RAID Manager), a 5 year warranty, and signficantly cheaper (can't remember difference off hand)

Re:Putting the dot in dot com by Anonymous Coward · 1999-11-18 22:43 · Score: 0

I'm suprised that you got a sales rep to get back to you considering your an edu sight. We also where looking to invest a large chunk of our budget into Sun disk storage but couldn't get a Sun sales person to return any of our calls. We even went as far as to tell them that we needed to fill out a purchase order by the end of next week so they ended up claiming it would be no problem to get back to us by the end of the day. By the end of the week they declaired we should just give them a couple more weeks to get back to us. I guess that with the edu discount that Sun sales just figure getting back isn't worth their time. The joke around the dept. now is "Sun put the dot in dot com but we are ee-de-you so will have to go with IBM instead."

raidzone.com by drbart · 1999-11-18 22:56 · Score: 1

you mentioned ide raid but not by name. i've been looking at raidzone's solution. haven't bought yet, but it does all the hot swapping stuff you want *and* is riding the ide cost curve, which is now at 20G/$200.

the interface is neither ide nor scsi, but rather a board in your pci bus.

oh, right, you have an ultra sparc. *LOSE THE ULTRA SPARC*! they are not fast. you're better off running linux or freebsd on an x86 farm or beowulf cluster.

Not Disk, but OS Bottleneck by Anonymous Coward · 1999-11-18 23:02 · Score: 0

Run a benchmark like postmark on your current system, and compare it with the result on a system like the HP netserver LPr with the 10k rpm drives running linux (say 2.2.10-ac12, or 2.3.x). I think you will be amazed at the difference. You don't need raid, you need a faster OS & filesystem. http://www.netapp.com/tech_library/3022.html

Performance Issues by Anonymous Coward · 1999-11-18 23:07 · Score: 0

You indicate that you plan to have six disks in the array. You can partition this in a couple different ways. You suggest creating two RAID-5 arrays with three disks each. The problem with doing this is that each array will use one column of capacity for redundancy information. This means that you are only effectively getting four disks worth of capacity for a six-disk purchase, and your configuration is still only single-fault-tolerant. This may be suboptimal from a cost-justification point-of-view. One alternative to this is to create two physical partitions on a single RAID-5 over all six disks. Most SCSI-based RAID controllers will let you do this and represent each such partition as a separate SCSI LUN (logical unit). This recovers one disk worth of capacity at the expense of increasing seek times (possibly dramatically), so you may prefer to spend the extra capacity to avoid mandatory long seeks.

This may all be irrelevant, however. The workload you describe is mostly mail, right? In that kind of workload, you're mostly talking about lots of very small transfers. The advantage you will get from striping is minimal to zero in that case. Furthermore, your filesystem is probably seeing a very high metadata-to-data update ratio. You don't mention what filesystem you're using. Many filesystems will force synchronous metadata updates in order to ensure consistency at critical points. This translates to a large number of mandatory seek and read-modify-write cycles. In a busy workload, the head is often moved to service another request during the modify portion of that cycle, forcing another seek. In modern disks, seeks are performance-killers.

If you're looking for managability and performance improvements, you may wish to investigate filesystem and soft storage system alternatives, such as Veritas' filesystem product.

You should also consider putting such a filesystem solution on top of any hardware solution you might purchase.

Here's the deal by Anonymous Coward · 1999-11-18 23:16 · Score: 0

If you want spitfire performance then you have to get rid of the 7200 rpm drive and replace it with a 10000 unless you have two u2w controller cards.

Then either do a Raid 0+1 which is best for redunancy but burns alot of drives
or
Get a NetApps box or a EMC Calerra Filer. These things scream and are the best solution if you want to connect another server with the same or different plateform.

You can go Raid 5, which is good but don't go software Raid or anything with IDE. Go SCSI or FiberChannel.

Good luck.

Try a more scalable solution by dijit · 1999-11-18 23:22 · Score: 1

If your machine is reaching such a load, grab another machine, separate SMTP and POP/IMAP services between the machines, and have the SMTP server NFS mount the drive the POP/IMAP server stores the mail folders on. Slowness in getting the mail from SMTP to the user's mailbox is a minimal problem.. It's bursty by nature. You'll have to rsync the passwd file or use NIS (eew) and make sure you filter all the ports you just opened up at the router. We have successfully used this to drop the load down to a dull roar on a sparc 20 used for POP and have an E150 to handle SMTP (which idles at a load of 0.02 or so). You'll want to understand DNS and MX records first, but it's not that hard to wrap your head around. Many of these Linux bigots think just getting faster hardware is always the solution.. Distribution of load and quality of hardware and service make up a lot when you have a lot of people depending on it.

You'll probably want to investigate whether or not it's your disk I/O that's actually causing your problem. If it is, (and I know I'm going to look like the antichrist of /. because I reccomend this) you may want to look into the Sun Storage Solutions since you made the right decision to get a Sun in the first place. http://www.sun.com/storage/disk.html The MultiPack (http://www.sun.com/storage/multipack/) works very well. The disk I/O speed is plenty for a fairly heavily used Oracle server we have.

// dijit tobkin-at-metnet.edu tobkin-at-umn.edu tobkin-at-tobkin.com

RAID-5 is faster for large writes by Anonymous Coward · 1999-11-18 23:26 · Score: 0

Besides from being obviously more expensive, RAID-1/0 is not always faster than RAID-5. When a request from the host writes more than half a stripe worth of data, RAID-1/0 needs to perform more disk accesses (because every write has to go to both disks in each mirrored pair, while RAID-5 has a single parity unit per stripe). The math is trivial. If you don't believe this, consider the sample case of writing 4 consecutive stripe units on a 5-disk RAID-5: RAID-5 writes the 4 units and the parity computed in-core for them (ie 5 disk writes), while RAID-1/0 would need 8 disk writes (writes to 4 mirrored pairs) So, as usual, the answer depends on the workload.

Some people don't ditch old hardware by Anonymous Coward · 1999-11-18 23:40 · Score: 0

I have a 5 year old 2gig Baracouda (sp) in my normal workstaton, also a 9 gig drive, I also have room for 8 more devices on that SCSI bus. With SCSI you don't pitch the old drives, you just add more.

I have an even older drive in my firewall at home. It is an old CDC 300 Meg. Plenty of disk for a linux masquerading firewall with a connection. The firewall PC is an old 386-20 with 16 Meg of ram. Also plenty for the task.

I don't understand why people think that old hardware is completely useless. I like it though, I got both the drive and pc for my firewall were free because people thought they were too old to be useful.

Re:Some people don't ditch old hardware by Anonymous Coward · 1999-11-21 04:21 · Score: 0

True, old computer hardware can be useful for a long time. My dad uses my old Mac Plus just fine and I am making my old PC from work (166 Pentium) into a firewall.
The issue that comes up is if an old piece of hardware fails, would you even bother doing anything else other than getting a new/another replacement for it. OTOH a valid warranty for a 1 GB drive might get you what is the current smallest drive. Hmm.. free hardware. :)

Storage = Netapp by Lev · 1999-11-18 23:48 · Score: 1

If you have the money to spend, I recommend talking with Network Appliance. (www.netapp.com) They have some VERY nice storage hardware, and it is everything you wanted. Fast, scalable (up to 1.4 terabytes currently). We have a small 7, 9gig drive solution currently, and it's a dream come true. Fast, reliable, you name it.

IDE still isn't SCSI by DragonHawk · 1999-11-19 00:09 · Score: 2

There is really not so much that differentiates ATA from SCSI anymore.

I wouldn't go that far.

Yes, IDE has finally caught on to such things as DMA and busmastering, and throughput on IDE devices is in the same arena as SCSI now. But.

IDE is limited to two devices per bus, and generally requires one IRQ per bus. IDE also has very strict and short cable length limits, and lack a "external" connector -- you generally can't have an external IDE device (I know is is possible, but the cable restrictions make it very difficult).

There are more kinds of devices (scanners, printers, etc.) available for SCSI then IDE. SCSI is generally more capable in terms of what you can do with it.

IDE controllers tend to be very primitive compared to their SCSI counterparts. Things like bus disconnect, command queuing, scatter-gather, even busmastering are often not available or iffy on IDE controllers. This applies especially to the onboard controllers in many motherboards; the number of shortcuts taken there are incredible.

Likewise, the drive electronics and HDA components in IDE drives are often cheaper then those in SCSI drives. These are all design and engineering issues, not issues with the specification itself, but they exist. The problems stem from the fact that IDE is marketed to be cheap, cheap, cheap, and thus gets are higher incidence of cheap components. It isn't limited to IDE, either -- you can also find cheap SCSI hardware, it is just that there is less of it.

IDE often appears faster in benchmarks, because benchmarks typically try to do operations in bulk on a single device. IDE has a lower command overhead then SCSI, so for such things, IDE will be faster. But when you get into the real world, and have multiple processes trying to access multiple devices at once, that is when IDE stalls, while SCSI keeps on going.

I realize this started off as a discussion about RAID, and that IDE RAID devices are not your typical RAID devices. They usually have one drive per bus, connected to a custom controller that multiplexes them all and presents them to the host as a SCSI interface. But the topic has drifted to more general applications.

Just my 1/4 of a byte. ;-)

--

dragonhawk@iname.microsoft.com
I do not like Microsoft. Remove them from my email address.

Fasttrack/66 and *BSD/Do they work? by Anonymous Coward · 1999-11-19 00:24 · Score: 0

While I respect SCSI, I love SCSI, and I admire the DPT SmartArray 4 and 5 cards, I was wondering if anyone has experience with the Promise Fastrack/66 RAID controller? Its a low cost IDE raid controller that supposidly handles the mirroring in hardware and contains its own BIOS. My question is, will it work with FreeBSD and OpenBSD? I think it would be perfect for lamer systems that are just being used for backups or logs... dual 20 gig IDE's fully mirrored. However, I can't get a solid answer from anyone on it. Will it work outside of windows? If it looks like a single drive to the system, I don't see why it wouldn't. But no one can say solid, YES it worked or NO it didn't. So here is my chance to ask here.

Re:Fasttrack/66 and *BSD/Do they work? by nikolas · 1999-11-19 00:32 · Score: 1

I have asked their tech support if it would work with linux about two weeks ago, and they replied that they would come out with a driver in about one month... Maybe there`s a *bsd driver coming, too?

Re:The trouble with NetApp [ NetApp = Microsoft ] by Anonymous Coward · 1999-11-19 00:29 · Score: 0

I'm a Sys admin. for a mid-size company. Read the bit on the toasters archive about how Yahoo got burnt with the NetApp toasters. I've been visited by a couple of NetApp slimeballs. NetApps are Pricey and you don't really get what you pay for. I'd rather go with Server attached storage and have a lot more choices and flexibility than tie in with a single vendor who can screw you in the most critical moment.

Re: MX records by Omniscient+Ferret · 1999-11-19 00:31 · Score: 1

I just read more info about this; this sounds like it provides an escape hatch when the load goes sky high.
This is a quote from an Eric Allman interview on sendmail.net:

Are there features in sendmail that people should be aware of but aren't?

Oh, there are probably dozens of them. One that comes to mind, a very simple one, is the fallback MX option, which lets you redirect mail that has failed the first time to another location. It essentially acts as a lowest possible priority MX record for all hosts. For example, if you've got a mail system that's got a lot of traffic going through it, you have another machine that you dedicate to the slow mail, the stuff that didn't go through the first time, where presumably you're less concerned about how quickly it goes because the other end's being slow. So you set your initial connection timeout to something low - five seconds, ten seconds, whatever's right for your site - and you set the fallback MX on your main site to this fallback host. That way the mail that's going to go through quickly just goes fsssssssst right through your main server, while the stuff that's going to be slow (because the other end is either slow to connect or down) goes off to this other machine and doesn't clog up the main machine. It turns out to be just an amazing win. And these days the price of a PC box running FreeBSD or Linux is close enough to zero that it might as well be zero, so it's not really a problem to do it.

BudTool == Legato now by Anonymous Coward · 1999-11-19 00:49 · Score: 0

Yep, Legato NetWorker. One of the first things they did with the BudTool code was create a NetWorker ClientPak for NetApp file servers. For $7k and change, you can get a license for this (that is, if you are already a NetWorker user).

Sure as hell beats backing up an NFS share off of a different machine (gack).

RAID for Database by jeremyphillips · 1999-11-19 00:49 · Score: 1

I've been researching this subject alot the last week in reference to a database.

We're planning on deploying 2 database servers accessing data off of one external disk array. The second would be a failover server, so they shouldn't be concurrently accessing the same data/partition, but could. I know multiple boxes can access a single disk array through one scsi bus, but everybody always talks about them using different partitions. Can you have 2 boxes access one partition on SCSI? Fiber?

What form of RAID would be best? 5? 0+1? I almost wish I could to a 5+1 - Stripe with parity mirrored. I know that's alittle paranoid, but so am I... :-)

We're looking into the Gateway and Dell disk arrays. Has anyone heard good/bad about these? They have max of 8 disks, what would be the bets configuration?

Thanks,
Jeremy

--
Jeremy
"Opinions are like assholes; everyone's got one..."

Here's what we do... by Anonymous Coward · 1999-11-19 01:09 · Score: 0

... in situations such as these:

We have multiple address records for a mail host (let's call this machine "smtp") to round-robin through
Each of these addresses is assigned as a mail exchanger within the domain with equal weight to the others (domain.com IN MX 10 address[1234...])
Each of the machines mounts its mail spool off of a NetApp 720 (/var/spool/mail), while handling /var/spool/mqueue on a RAID 0 volume and logs etc. on yet another volume(s)
The POP/IMAP servers are separate, and mount /var/spool/mail off of the NetApp. The daemons are modified to utilize necessary file locking.
Account info and auth is handled by yet another system
This works pretty well for heavy load. Adding more users? Add another machine and stir...
We're not the only people that do it this way - I believe there was a paper submitted at one of the LISA conferences that describes just such a setup.

47GB UW-SCSI for $695 by otis+wildflower · 1999-11-19 01:11 · Score: 1

at CSC.. I'm not an employee, just someone who bought a 4-tape DAT autoloader for $269 from them and is quite happy with it..

Your Working Boy,

A bit about using IDE/UDMA Raid Solutions by EvilNight · 1999-11-19 01:53 · Score: 1

I can give you some input on #1. The main (and only) advantage to using IDE over SCSI is price. I have a 70GB (4x17.2GB Maxtor) UDMA Raid0 running on a server at home. It cost me only about $700 to build it. It is running on a Promise Ultra66 controller. I have run raid on it under both Linux and WinNT and it works great. Disk performance is actually very impressive, much faster than a normal IDE drive, but that is to be expected when you stripe 4 UDMA drives. It is still nowhere even close to the speed of a good SCSI Raid setup.

I'm always amused at the large number of people who immediately think that because you are placing IDE/UDMA drives in a Raid configuration it will cause the drives to die quickly. That's bullshit. Granted, the SCSI drives will last you a hell of a lot longer, but IDE won't keel over and die just because it is Raided and under a high performance load. Most IDE drives will last at least for the length of their warranty period. Make sure you get the 'SMART' enabled drives and some monitoring software to give you a heads up if the drives begin to exhibit signs of failure.

If you want reliability and speed and are willing to pay for it, use SCSI. If you want large amounts of space and average speed at a decent price use UDMA. My needs run to cheap space and lots of it, and so far the UDMA solution has worked well for me.

I wouldn't recommend anything but a SCSI solution to you for any situation where you are looking for high performance fault tolerant systems. In your case I would go with option number two in your post above, and option three only if you are really, really worried about losing your data.

PS - This is running as a software Raid0, there is no hardware present. I have seen a number of benchmarks (some from Ars-Technica, don't have the link) that claim the performance of the Promise raid controllers is exactly the same as a software raid. I'm not sure if their competitors have this problem, or even if they have any competitors in this area.

--
Hell is being intelligent in a world full of idiots.

Non contention backup solutions? by Anonymous Coward · 1999-11-19 02:01 · Score: 0

Since this question is dealing with performance, I'd like to get pointers to fast backup solutions that don't require down time. We have a couple hundred gig and backups are taking hours. (some of this is from raw (informix) partitions).

We are looking at an app that basically does a dd to tape but requires the system to be down.

thx

Raid-0 by krynos · 1999-11-19 02:37 · Score: 1

For precious data, I agree that Raid-0 is insane.
but for temporary data that doesn't need to be 100% sure to be kept, that can be a good solution.
For a news server spool, if it crash replace the disk and the standard NNTP messages will refill the news spool shortly (depending of the connection speed)...
Of course OS disk and what users send and other data that is more important should not be on Raid-0...

Re:Go with a professional solution (RAID-1 vs 5) by ninjaz · 1999-11-19 03:27 · Score: 2

If the money to swing this is impossible, then I'd recommend mirroring rather than RAID 5. All these kinds of things are compromises between reliability, cost, convenience and performance. RAID 5 is an excellent overall solution from a performance standpoint; but if you cannot afford this RAID 1 is a good choice. It offers fast reads at the cost of slow writes and survival from failure on either disk. In this application, users won't be affected by slightly slower write times. Since drives are so incredibly cheap these days, I'd say this is a pretty good choice if you are strapped for cash.

Actually, RAID-1 is more expensive and faster for writes than RAID-5.

The reason for this is that RAID-1 uses 1:1 mirroring of a 2-drive set while RAID-5 uses rotating parity in which parity information is distributed across all drives.

With regard to space, using RAID-1, your usable yield (what shows up in df) is half of the total disk space put into it. With RAID-5, parity info is spread througout all the drives. Eg., I have a RAID-5 using four 4GB drives, which gives me 12GB of usable space. With 0+1 on this configuration, it would be 8GB usable.

As for speed, both RAID-1 and RAID-5 allow you to read from multiple disks at once (which, of course, is a win). For writes, a drive pair in a RAID-1 will take as long as a write to a single drive. On RAID-5, however, it takes longer because (afaik) the RAID controller has to determine which drives to write the parity info to, which takes CPU time.

A decent little overview is at DPT's site (sadly, only in PDF) at http://www.dpt.com/pdf/understand_raid.pdf

fibre channel by Anonymous Coward · 1999-11-19 03:28 · Score: 0

You might want to look at some of the clustering solutions. Fibre channel hard drivers with intelligent raid controllers would be a good solution. Unisys makes a fairly nice box that has 10 drivers with two controllers (does fall over), we've got 10 18gig drives on it with no problems (or lack of speed).

Using EIDE/SCSI RAID on Mainframes/Unix!! by Anonymous Coward · 1999-11-19 03:55 · Score: 0

We've deployed several here. We use them on mainframe boxes as well as Unix. RAID-5 we have 80G to 180G units in place right today over our network. The mainframes hit these guys HARD. I highly recommend 256M cache and a RISC based controller. Good performance for the buck! We use http://www.excelcdrom.com primarily as our supplier

Filesystem ... by FonkiE · 1999-11-19 06:47 · Score: 1

raid configurations tend to imply huge virtual drives. huge drives need a loooong time for a filesystem check (once i had a >3h one with a 72GB drive/raid 5). therefore i would highly recommend a log structured filesystem!!!!!!

the gdt controller (http://www.icp-vortex.com/) works fine with linux (and of course any other operating system, linux tools for i386/alpha available).

about raid modes: security: mirror, one drive security: raid 5, speed: striping - these are the common uses, but the choice, depends on your needs ...

CU

When a drive dies... by FIGJAM · 1999-11-19 08:37 · Score: 1

SCSI drives have a 5 year warrantee
IDE drives are 3 year (at least, I haven't seen 5yrs on an IDE drive)

When a drives dies, usually the disk can still spin, so is it the electronics that is the real problem?

Maybe someone willing to risk their drives (and any warranties) who have an IDE and a SCSI drive of the same model and swap the circuit board over on each. I did this on two dead maxtor drives once (slightly different models, same drive casing) and ended up with one working hdd :)

--
Do your best, hope for the best, suspect the worst.

Dynamic Network Factory? by realdpk · 1999-11-19 09:04 · Score: 1

Just curious if anyone has worked with the disk arrays made by Dynamic Network Factory (or any similar products by other manufacturers?)

They say they use Ultra DMA drives, and connect to your machine via SCSI. Seems like a good way to put the I in RAID - assuming the product is as good as it looks.

Some real advice, in no particular order by RallyDriver · 1999-11-19 10:42 · Score: 1

Most important - DON'T USE RAID 5. It's not right for that application. RAID-5 assumes read-mostly, and is aimed at things like user home directories and app software; it is slower at writing large amounts of traffic than a single disk.

Take the time to understand how different RAID types ("levels") work and what is needed. RAID-1 is obvious but is space-inefficient (50% usable capacity) and doesn't solve the performance issue without adding striping (aka RAID-0) too.

RAID-3 may work well if you can get the stripe size down to a single write for the filesystem, e.g. 4+1 discs, 512 byte disc block, 2K array stripe and 2K filesystem block. Beware that many packaged arrays are software optimised for RAID-5 and / or RAID-0+1 and suck at RAID-3.

Sounds as though your price point rules out many of the midrange and high end toys that have been bandied about. Forget about EMC :-)

There are a number of cheap SCSI to SCSI and SCSI to IDE standalone RAID boxes going round, and also PCI to SCSI or PCI to IDE cards for internal mount in server PC's. They're closer to your capacity needs (start at sub-50Gb, sub-30Mb/sec).

IDE vs SCSI for the drives is not that important up to 7,200 rpm, but will tell with 10,000 rpm units. The bandwidth from the RAID controller to the host is more important, so make it Ultra-wide or PCI.

From past experience, Sun StorageArray (or whatever they are called now) were a bit behind the technology curve; in 1996 they were still using the host OS for software RAID support, and upgrading Solaris meant hacking the array. They are all OEM anyway. Go to a storage expert instead, but one cheaper than EMC :-)

Clariion are good for plug and forget, but may not have something down in that price range. However, performance on low-end models, even FC to FC, is not stunning. The 5700 series is (was?) overall good value, but requires FibreChannel attach.

HP AutoRaid by Anonymous Coward · 1999-11-19 10:56 · Score: 0

The HP AutoRaid is a sweet, fast drive. Caches all data in ram drive. When it has has time, It writes it in raid 0+1. It is stripped automatically all over the drives. After the drives are half full, it converts the Older data to raid 5. It always keeps a small part of the drives at raid 0+1, even when it's 100% full, for speed.

You can think of ram as cache for the Raid 0+1 parts of the disks. Then, Raid 0+1 is cache for the Raid 5 parts of the disks.

It doesn't need a lot of tweeking. If it gets slower because it's too full, just add another drive. It converts the newer or most used data back to Raid 0+1. This can be done in the middle of the day, without downtime.
If you fill it up with smaller disks, and you need more disk space. Just pull out a 4 gig disk, stick in a 18 gig. Let it rebuild. Pull out another 4 , replace it with another 18. Again, middle of the day and no downtime.

I love this drive, man. It's fast, and you don't have to manage it. I didn't say it was cheap, I just said it is worth it.

Re:Choose RAID Level carefully! by MacBoy · 1999-11-19 12:14 · Score: 1

The original poster mentioned considering a RAID level 5 array to try to speed up access. However, Level 5 can actually slow down access times . It increases throughput, but throughput is usually not the culprit. Access time is. To get faster access times, use a mirrored array (level 1), where multiple disks all carry identical information. Read access times are dramatically improved, because each disk can service just a fraction of the overall read requests!! In such an array, reads don't involve all disks, only writes do. Therefore, doubling the number of disks in a mirrored arrar theoretically doubles the number of read transactions that can by done per second. Real-world results vary, but are dramatically better than with a single disk. If the disk is getting a lot of small read transactions per second, rather than a few very large ones, then a mirrored array is the way to go, not striped!

A solution by Anonymous Coward · 1999-11-19 12:43 · Score: 0

When I worked at an ISP a while back (+-30k users), we did the following:

A whole bunch of POP servers each storing different accounts (i.e 3 machines each with 0.333 of the users on them). A proxy then redirected the user to the correct machine when he checked his e-mail. The machines were P133's (this was long ago), with mirrored hd's. For incoming mail, use multiple MX entries with the same priority.

This offered great scalability, but if a POP box fell over, 1/3 of the users would have lost their mail. Nowdays, I would therefore modify the setup to something like this:

A whole whack (about 4) of servers sharing the data between them using CODA (or something similar). Round robin DNS set all of them as POP servers, sharing the load nicely. Also multiple MX entries made them all incoming mail servers.

This setup is very very scalable, and reliable too. In fact, you could have a hoard of CODA servers sharing the data in the background, and then set up your POP servers as clients to distribute the load even more.

Is reading mail write intensive? by Anonymous Coward · 1999-11-19 14:52 · Score: 0

Our mail server's spool bottleneck turned out to be a result of the fact that clients reading mail was a write intensive task. The clients were NFSv2 linux machines running mostly pine. Solution was memory between the system and the disk (same as a RAID's cache). Seemed to solve the problem and made the drive much quieter too. This was for a small group of about 700 accounts.

Useful commands for studying this include iostat and nfsstat.

I've also been looking at a SCSI/IDE RAID. One I'm considering is called RAID FlyerII.

how do they compare for reads???? by Anonymous Coward · 1999-11-19 20:38 · Score: 0

EOF

That case... by Anonymous Coward · 1999-11-19 20:56 · Score: 0

Interesting. My case is that exact same brand, xcept it's a mid tower.. I got a weird square-U-shaped bracket thing with it, looks like it would be perfect for holding a hard drive. But I can't for the life of me figure out where it mounts! =) It's not above the PS, I've got that one in place. Also, on mine, the ATX format punch-sheet that goes in the back of the case was cut incorrectly! They punched all the port outlines, then rotated the thing 180 degrees and punched all the tabs and holes in the outside so it fits in position... Had to take about 3 hours and a hacksaw to fit it in place. grr. Lime

Free RAID for Linux by Anonymous Coward · 1999-11-19 21:15 · Score: 0

Just use software RAID. I've got two 9G SCSI drives hooked up using RAID0 which gives me a nice 18G parition. Ok, no hot swap but it's free. I'm using RedHat 6.1, which to my surprise and delight, came with software RAID support already compiled in the kernel. It was embarasingly easy to get going.

FOLLOWUP - Current Solution by sp1n · 1999-11-19 21:38 · Score: 2

It took quite some time for my original question to be posted, and we were on a critical schedule. We ended up buying a whole new server and internal RAID controller. Details follow:

After much shopping, questions, advice and temporary insanity, we decided to go for a new Linux box to handle the mail. Apparently, the load wasn't only coming from disk i/o wait; the kernel was using 70% cpu. We chose a Dual PIII/500 setup on an Asus P3B-DS, 512M ECC SDRAM (less than before, but prices are so high right now, and we figure processes should end sooner on this box), Intel Pro/100, Seagate Barracuda for system, six Seagate Cheetahs for spool and mail storage, and a Mylex eXtremeRAID 1100 (w/ the 233MHz i960).

It was configured with 5 spindles in RAID 5, with 1 as a hot spare, and then partitioned in half. I'm confident this badarse controller can keep up on the writes, with minimal performance hit. Preliminary results with bonnie are inconclusive, since it's working with one huge file, rather than thousands of small files. If write performance lags once it goes online (this Sunday am), we'll split it into 0+1.

Exim, QPOP, and IMAPD were hax0red to use a double-hashed directory structure. ie: "spin" would reside in /var/mail/s/.p/spin (the dot was required for those who have a single digit username). This should eliminate any overhead that ext2fs may have with large directories.

Thanks for all your advice, keep it coming. If you're a gamer, check out http://www.xmission.com/quake

-Kevin Blackham Xmission Internet Salt Lake City, UT

Re:FOLLOWUP - Current Solution by Anonymous Coward · 1999-11-22 01:46 · Score: 0

> and a Mylex eXtremeRAID 1100 (w/ the 233MHz i960).
I believe the i960s, which are used on other Mylex-controllers, run at 33 or 66MHz, and that the eXtremeRAID uses a 233MHz- (Intel/Digital/ARM-) StrongARM. (which, btw, rules ;-)
twi

Linux Fibre Channel HOWTO by Anonymous Coward · 1999-11-20 01:17 · Score: 0

We are using the QLogic Fibre Channel cards
extensively, connecting to both FC Fabrics and
hubs. There are several drivers available for
SCSI FC for the QLA2200 which sells for
about $400.

Erling Nygaard in my group has written a Fibre
Channel HOWTO which can be found at:

http://www.globalfilesystem.org/howtos/fibrechan nel_howto/index.html

Matthew O'Keefe

Re:redundant by unitron · 1999-11-20 16:48 · Score: 1

Using different words in order to better explain something is not being redundant and should be moderated up, not down.

--

I see even classic Slashdot is now pretty much unusable on dial up anymore.

qmail efficiency by Anonymous Coward · 1999-11-20 20:19 · Score: 0

qmail performs here very well. Statistics: Average successful deliveries per day: ~900,000 Average successful deliveries per second: ~10 Average total delivery attempts per day: ~1,100,000 facts: CPU: 2xPIII w/ 550MHz RAM: 512MB Disks: Two Quantum disks (9GB and 18GB), Fast-20 bus NIC: Intel EtherExpress Pro100 O/S: Linux 2.2.13 running qmail-1.03 with some patches, up to 800 parallel remote deliveries System load is 3 at peak times. I don't think exim would improve that noticeably. The point? qmail scales very good; if you need more speed, throw hardware at the problem. In our system, the only bottleneck is the network.

qmail efficiency by Anonymous Coward · 1999-11-20 20:21 · Score: 0

qmail performs here very well. Statistics:

Average successful deliveries per day: ~900,000
Average successful deliveries per second: ~10
Average total delivery attempts per day: ~1,100,000

facts:

CPU: 2xPIII w/ 550MHz
RAM: 512MB
Disks: Two Quantum disks (9GB and 18GB), Fast-20 bus
NIC: Intel EtherExpress Pro100
O/S: Linux 2.2.13
running qmail-1.03 with some patches, up to 800 parallel remote deliveries

System load is 3 at peak times. I don't think exim would improve that noticeably. The point? qmail scales very good; if you need more speed, throw hardware at the problem. In our system, the only bottleneck is the network.

POP3 server/hashing of filenames by Anonymous Coward · 1999-11-20 21:09 · Score: 0

Just two points to improve your performance:

First, qpopper is slow as hell, because it copies the user' mbox before working on it. Using cucipop in a 14,000 user environment decreased the load from 25 to 6. See cucipop's freshmeat entry.

Second, you should choose a better hash function. Usernames are hardly good distributed, so you will have to deal with many collisions. I'd go with MD5 here, because it is quite fast and is distributed very well.

Special fs solution by Anonymous Coward · 1999-11-21 22:24 · Score: 0

This is also very frequently seen in large news servers and the solution is to split over several directories, preferably using clever hashing.

Some solutions with source can be seen in the Erez Zadok's Research web pages.

RAID 10 by Anonymous Coward · 1999-11-21 22:42 · Score: 0

Since RAID 10 combines stripes and mirrors you have the potential of increasing transfer rate while reducing seek time. This depends very much on the particulars of the system.

Nevertheless it does look like RAID 10 has a high fun factor

Some preliminary work and proposals are under way for a special RAID 10 that allows this but nothing is ready for testing yet. I would however expect commercial solutions to have this and more, after all they do charge a pretty penny for it, yes?

Actually... by YuppieScum · 1999-11-21 23:42 · Score: 2

My understanding was that some fs's will perform some actions to avoid some fragmentation.

A collegue of mine recommends doing a complete backup/reformat/restore cycle every 2 months or so on partitions that see a great deal of edit/extension to files - on a partition in use since '93 i expect this would give a radical reduction in trashing . . .

I also give you a chance to test your backup procedures :)

--
This sig left unintentionally blank.

HW RAID, new and old by Anonymous Coward · 1999-11-29 16:16 · Score: 0

Here in the "new days" I still see processors such as i960 in use, and in fact that processor is central in the I20 specifications. The only general purpose processor I have heard of in HW RAID is the StrongARM. Pentiums I have never seen. Obviously I am hoping to see what you are referring to.

What is more common, however, is the use of ASIC and FPGA and that has a greater potential for improving the speed than general purpose processors of yesteryear.

I expected HW RAID to cut down on main bus traffic, especially PCI bus traffic on the motherboard but now it seems those gains are rather small.

Slashdot Mirror

Pros & Cons of Different RAID Solutions

261 comments