What Are Typical Load Averages for Servers?
Jon Hill asks: "I'm curious to figure out how to guage the performance of my servers and know at what level of usage I should think about hardware upgrades. 90% of our servers run Linux and various services standard to Linux such as sendmail, samba, DNS, etc. One of our main servers (router/firewall/sendmail/spop) has been running with a load average of .5 to 1.5 regularly. It supports 200 users and is an SMP Intel machine with 2GB of RAM. I'm not sure if it needs software/kernel tweaking or hardware modifications and I can't seem to find any reference information. Suggestions?"
What you are asking about is 'performance tuning'. Do a search at Google on that term and you will find plenty of information online. http://linuxperf.nl.linux.org/ might be a good place to start.
:-)?
Average load is unique to a system. To figure out what that average is you need to monitor the server for a while.
I don't know about Linux, but I've done a lot of NT Server tuning and I think some of the general principles can be shared across platforms.
* Monitor CPU, Memory, Disk, and Network load over time (these are the four primary sources of bottlenecks in computer systems). Figure out what is regular for *your* systems. I take samples a few specific times a day every few days.
* If one metric is consistently high, at or near 100% utilization that's a good sign of a possible bottleneck. Take care of that bottleneck by increasing processor speed, adding more memory, adjusting the settings/algorythms of your software, etc.
* Make one change at a time, and then measure the results.
* Document your changes, so then if you actually slow the machine down you can go back to the original status
* When you remove a bottleneck, it is replaced by another. That's the name of the game.
* The best way to tell if you have a bottleneck is user input. Are they complaining that database lookups take to long? That web pages aren't delivered fast enough? Or are they quietly content (right, you wish!
Good Luck!
obviously no deficiencies vs. no obvious deficiencies
The Solaris server where I work has 16 processors and the load average usually sits around 10-15. I'd be worried if my single-proc linux workstation had that high of an average, though... :-)
* And remember, it's spelled N-e-t-s-c-a-p-e, but it's pronounced "Mozilla."
Load average is a measure of the number of things 'waiting' to run. Depending on your OS this may or may not include a number of intersting corner cases. In particular, this almost always includes things like disk i/o, and tty i/o. A user with a CPU bound process won't notice disk i/o issues, and vice versa.
So what is the range of acceptable? Well, for a single user workstation a load average of 1 (one thing waiting) probably means the user is waiting, and you may want more CPU or disk bandwidth. On the other hand, a highly multi-user machine (say a news server) may get optimal transfer rates out of the disk hardware by having a lot of things waiting so it can schedule reads and writes.
Look at all the resources on your machine, use tools like vmstat, iostat, netstat, etc. See why processes are waiting. Look at your user load and see if it's ok. For instance, with a 100Mbps ethernet, you could serve 10 users at 10Mbps each, or 100 at 1Mbps each. The later will have a higher load average, but if 1Mbps per user is fine with you, then there is no problem.
To give some real world examples. I've seen news and mail servers both run load averages well over 200, and sill deliver acceptable performance. I've also seen shell servers with load averages as small as 5 that are very sluggish (often because they are swapping).
Remember that load average is not as clear an indicator of overall performance as it seems. Load average is based upon the average number of processes that are waiting for kernel execution time. So, if you are IO-bound, a very common problem even in the server market, then load average won't accurately monitor performance. If you're old school, use sar (System V) or vmstat (BSD) for performance monitoring. Of course, more modern tools exist.
If you've got an SMP machine, and your averages are .5 to 1.5, then you've either got too big of a machine for the job, or you should put more stuff on it to utilize it better.
A processor utilized 100% of the time will give you a load average of 1.0. If you've got two processors, you should aim for a load of 2.0 average.
So, good news! You don't have to do any tweaking for performance, unless you have specific issues with the speed of the server. You can probably add more to the server without affecting other processes (unless you've got a lot of I/O going on). You only gave CPU stats, so I am assuming that's what you're concerned about.
If tits were wings it'd be flying around.
I think the load average isn't a hugelt useful measure for whether your setup is fast "enough". In your case, a load of 1.0 on an SMP machine suggests that it could handle about twice as much (YMMV) work before it started to get slower for the users. Which is a handy thing to know.
A more interesting measure is how well it copes under a heavy load, rather than an average one. For example, what are your peaks like? Do the users notice?
What's the load like when everyone arrives in the office in the morning and checks their mail? How much of an increase in load would it take to make it unuseable for everyone?
I think that kind of measure is more relevant. If your number of users increased by 10%, would everything fall over? (likely to happen eventually if the average load goes above 1.0 per CPU because it can never catch up with its workload)
- MugginsM
When dealing with processes waking up independently (which isn't *completely* wrong in the case of a web server) the load will tend towards [number of processors] * ([cpu utilization]/(1-[cpu utilization])). Your load of 0.5-1.5 on a dual processor machine equates to a cpu utilization of 20-43%.
Tarsnap: Online backups for the truly paranoid
Charles Dickens said it best:
"Two CPUs, load average 1.95, result: happiness. Two CPUs, load average 2.05, result: misery."
Peter
This is only somewhat related, but back in 1990 I worked on a Sequent (now IBM NUMA-Q) that had 10 80386 processors. We regularly ran 200+ users with a load average under 1. We had planned for 10 users per CPU, but it held up well at nearly 30 per CPU.
#!/usr/bin/perl
# The closest thing I found to real CPU usage
my $pcpu;
for (`ps axo \%C`) {
next if m/\%CPU/;
$pcpu += $_;
}
print $pcpu;
:wq
"Ask Slashdot" is not a valid research tool, as the responses will range from the good to the horrible. If you don't know what you're doing, then you could take the wrong advice.
That said, here's mine:
System performance is not something that can be summed up in a single statistic like "load average". Load average in particular can mean different things on different platforms (on some SMP platforms 4 processes on 4 processors => LA = 1.0 & on another it = 4.0). I'm not familiar w/Linux SMP to know its interpretation.
Even given the different interpretations, what constitutes too much load is largely a matter of opinion. Some people would just as soon run a system with a high load as long as interactive response remains acceptable. Others want the system to never really break a sweat.
To judge system performance properly you should look more at the nuts & bolts of the system in order to determine which parts of the system are performing well & which are performing poorly. Collating and presenting this sort of data can be nicely done with tools based aroung the rrdtool package.
I would suggest that you get a tool that can monitor the vm, i/o, cpu utilization, & various other statistics over time. Then analyze the results and see what can be done to improve the system, or if the system simply needs to be replaced (too many subsystems need upgrades or the system is too old to be worth it).
And yes, I was serious - buy a book on the performance tuning & read it.
I used to write code for Undernet. We deployed a new services bot earlier this year. The initial trials of it were none too sucessful. We had a database server, and a physically separtate web front end running PHP.
;)
Because the people who wrote the initial code did not make it scale very well.. when it went live with 80,000 people trying to use it, the poor boxes croaked.
The webserver hit a LA of 117 and the DB server got to 145. There are no decimal places in those numbers people
http://www.sarcheck.com/
"SarCheck is an inexpensive tool developed to help system administrators with UNIX performance tuning. It does this by analyzing the output of sar, ps, and other tools, and then reading more information from the kernel. It then identifies problem areas, and if necessary, recommends changes to the system's tunable parameters.
1) Users are complaining because it's too slow
AND
2) You actually have nothing better to spend it on; unless you are very lucky, this one is not true.
AND
3) Software tweaking isn't doing any good.
OTOH, tweaking the kernel and such is always fun. Here are a few ideas:
1) Recompile the latest 'stable' kernel optimized for your machine. 2.4.2 -> 2.4.12 produces a huge increase in I/O performance on my machine, for example. You may find out something similiar.
2) Related thing: BIOS updates and tweaking can sometimes go a long way.
3) Upgrade the machine to the latest distro; a nice thing about Unix is things usually get faster, not slower.
4) Figure out what is using your CPU time. For example, given you're running SPOP, I suspect a lot of that time is used for SSL. So recompile OpenSSL with better optimizations (the normal OpenSSL RPMs are always underpowered; asm is disabled, no -march flags, etc), and you should see a magical increase in performance.
5) Assuming this makes your system faster, celebrate by spending some of the money you would have used to upgrade on beer.
I've found it a reasonably good guide to when there's an issue on Solaris boxes; I think linux uses similar numbers to calculate run queue averages, but other OS's (eg, IRIX) use different formulas to calcualte it so you might need to tweak this recommendation.
I'm part-responsible for a bunch of fairly basic Redhat servers (single CPU, 256MB etc.) that spend their time crawling the web, keeping their 100Mb nework connections saturated, at around a load of 17 quite reliably, and have even worked at around 50, though accepting ssh connections becomes impossible after a certain level :-) But they still get all their work done eventually; I'm not sure what the relative efficiency of each server was relative to their load average, whether they get any more work done, but if all they're doing is using a lot of CPU, you can reliably push the load average very high without ill effect.
But a high load average is only a potential symptom of a problem, not a problem in itself. It might mean (as it has done with our machines in the past) that so many processes are running that memory happens to be low as well, and reliability goes down as processes don't cope with being killed or running out of memory. But if that's not happening, the only reason to worry is if the people using the machine complain: slow mail deliveries, POP3 pickups, DNS resolutions or whatever other 'work' the server is up to. If you want to roll your own benchmarks to test these things, you can then decide on how slow is too slow, and upgrade accordingly.
Matthew @ Bytemark Hosting