Ask Slashdot: Performance Monitoring for Linux
muadib
wants to know about the following:
"Given the current discussions on tuning, I am trying to
find out if there are any performance monitoring
applications for Linux. I don't mean things like xload,
xosview, etc which provide only a small amount of data.
For anyone who's done benchmarking under NT, I mean
something like their built in perfmon utility that lets
you view and capture just about any statistic on your
system or on a remote system. Capturing is the specific
functionality I'm looking for b/c I'm working on a Linux
device driver, and it would be nice to have historical data
of CPU utilization, interrupts/s, etc. so that I can
compare complete system perfomance between code revisions."
Here here! This is something I really miss in Linux. I use procmeter but it is not as flexible
as the performance monitor in NT i must admit.
It would be nice to monitor remote hosts too ala NT.
iostat, vmstat, free, and netstat all provide usefull information on a "higher level". I expect there are other tools I'm forgetting.
/proc directory provides greater detail with a simplistic read method ( "cat /proc/interrupts", etc). As the driver writer, you may need to do something to make sure your device's information is recorded/presented properly.
The
What I would really like to have is something like Sun's SE toolkit for Linux. Anyone know if there is such a thing?
vmstat comes the closest to what you are describing. It pulls all the data out of /proc, so maybe you will want by-pass it and get the data from the source.
/proc. Some drivers require you to recompile the kernel for extra statitics (like the scsi drivers).
`man vmstat` will give you usage details.
`man proc` will give you details about what is available via
-Philip
"procinfo" gathers information from /proc.
re VTune for Linux, Intel is already doing it. At least, they've been trying to hire interns to do the GUI. :)
These resources may help you get started (if they're not what you're looking for, you can always modify the source ;)...
/proc - http://24.1.97.22/gmd/tps/tpsmain.html
ktop - It should be fairly extensible in a perfmon sort of way......it's a fairly good copy of taskmgr - www.kde.org
treeps - gives a good view or the process tree using
Does anyone know why Cliff feels the need to make posts with his comments in italics and the quote in normal typeface when CmdrTaco, Hemo etc. do the opposite? (Even Mr. Katz gets it right)
-AC
(Eagerly awaiting the deserved -1)
So what! Sometime's the best things in life aren't free.
Just check out Windows 2000 Professional and you'll see what I mean.
A common misconception. MRTG can be used to monitor just about anything that can be reported to it by an SNMP agent. Some agents provide more information than others.
I started writing mpstat a little while ago, and I have the start of a utility that does some of what you describe.
Right now mpstat shows interupts per processor taken from /proc/interrupts. It als shows cpu utilization per cpu. Also shown is ontext switches, and maj and minor faults. All of the info that you are looking for is available in the /proc directory. mpstat can be found at http://www.mindspring.com/~joeja/programs.html.
my other program is not on the net yet, as it is inclomplete. The data is shown as raw data.
This the same guy that co-wrote "Sun Performance and Tuning" 2nd ed?
"monitor" is for AIX.
That is not true. perfmon does show averages over an interval.
If you don't mind Tcl/Tk, take a look at something
called moodss (there should be a freshmeat entry).
It's a modular spreadsheet and it's very easy to
write new modules to monitor various things. As a
matter of fact, some of the modules provided as
examples already monitor CPU, memory, and so on,
so you can start with these and expand them to
suit you.
Linux Journal, Issue 56, December 1998
/proc. (for ex, /proc/stat), then goes about developping some shell scripts to make use of that. Presents a complete system that produces nice graphs about daily activity, etc. Well worth taking a look at.
p. 30
"Performance Monitoring Tools for Linux", by David Gavin.
Really nice article. 6 pages. First introduces linux facilities to get information in
The author later re-wrote the tools in perl, and the result can be found at:
http://www1.shore.net/~dgavin/Computer/
Enjoy!
I'm working on this already. The next major version of KTop will feature a full blown performance monitor with save to file functionality. It is targeted for KDE 2.0, so some more month to go.
Chris (Author of KTop)
The tool he mentioned is the sun perfmeter clone called perf I specifically wrote for Linux in 1993 /proc/stat, and wrote
I created for this reason
a little program (rpc.rstatd) which makes it available
on the net. The only drawback is that it uses XView, which
seems to be outdated theese days. There are some
people who urge me rewriting it for Qt, but I still use the
XView version. You can get the XView perf from my homepage:
http://people.frankfurt.netsurf.de/Rudolf.Koenig
Sorry for being pedantic ;)
Chris,
Are betas available for ktop2?
Thanks
motjuste@briefcase.com
http://www.nl.linux.org/linuxperf/ may be of interest. It has pointers to a lot of performance monitoring related stuff in e.g. its links page.
Are you confusing the program called, "monitor" with the CRT for your computer wich is sometimes called a monitor? They are two seperate things, you know.
You know, mimick the interface, make NT sysadmins feel at home, and like they can reuse some of the $$$$ they spent on training??????
KDE does a great job of looking and feeling like windows, and their ktop (or whatever) does a great job of imitating the NT task manager, but so far I have not seen anything like a Kperfomon.
I know it is traditional to hate all things NT/95 but sysadmin tools with a NT/95 interface would have a very large built-in interface and possibly persuade managers into believing Linux is as easy to use as NT. After all, the fact that NT looked like 95 has to have something to do with it's acceptance as a server.....
There really is no "standard" gui sysadmin interface for Linux, why not take advantage of all Bill Gates legal ground work making it legal for you to rip off the look and feel of his sysadmin tools.
If you have any info on NT-alike sysadmin tools (such as a samba interface, event log, etc) let me know at:
motjuste@briefcase.com
I (stress the next two words)used to work for a company called Datametrics Systems Corporation (www.datametrics.com). They
:(
offer a product called 'Viewpoint' that does what
you guys are really hoping for: there's a UNIX process that reads something like 300 kernel variables at any rate (usually every 30 seconds) and then sends that data to a central monitoring program. The central program can talk to hundreds of UNIX,VMS,Unisys,NT,etc machines at once and plots and correlates
the data provided. The features it has are pretty mind-boggling; look on the web page to get a feel for it. If you look hard enough, I think they even made a Java and a Web version of the frontend.The tools are for enterprise
clients who want to know about the details of
their performance: if the cache hit rate isn't
as good as it should be, if the network is too saturated for best performance, etc. I beleive you
could even compare Linux and IRIX's relative
merits by looking at the two's metrics side by side under similar stresses. When I left, they
were adding some modules like an Oracle module
(to correlate kernel metrics with Oracle's SQL performance) and I personally suggested creating
an Apache module (which may or may not exist -
they have an API to program to, so it could happen
if someone cared enough to make it happen)
I should stop here and say that I am pretty sure
this product retails for tens of thousands of
dollars.
My understanding is that there was a port to Linux
done inhouse but I doubt they have rolled it out.
My other understanding is that the company has pretty much gone to shit since they were bought out right after I left; so who knows if they will be clued enough to want to work on Linux. If any of you are very excited about products like this
(ie, products for managing tens/hundreds of millions of dollars worth of computers) coming to linux, I'd suggest going on the web site, finding
a feedback form, and speaking your mind.
This webpage has a tool that gives info similar to perfmon under NT. Under Intel, it uses the model specific registers (which report everything from cache misses to branch delays...) as implemented by the library libpperf.
This tool is IMHO the best of the pack out there at the time for really understanding the performance of your programs with respect to caching and processor quirks. Check it out.
There's a beta tool at
http://www.blakeley.com/resources/vtad
No - Actually perfmon is OK - It has quite a few fudges in there (the source is in the SDK) but basically is uses the performance registry.
/proc is better in its configuration information than say SUN /proc
These are my experiences based on doing capacity planning agents for NT:
If you want to roll your own use the performance registry.
Now the performance registry is an interesting beast. Check out ther perfmon code to see how its done but the low down is that this API is unsafe. Be careful with multi thread access to Perf Registry (actually - don't)
The buffer size is assuming UNICODE size so it halves the buffer every call unless you refresh it (see the API) there is a MS bug report on this.
Calls may return crap even though the return code is OK - use unicode to check the header.
Use the counter size returned by the API not those specified in the header (they are wrong in some cases - not most)
The performance registry relies on other DLLs that may fail so it in turn may fail.
The spec is that you return info if you are asked but SQL Server 6.x returns info even when NOT asked and this is bad for performance if you only want some counters. (this is a known bug)
But beware others may do this too.
Actually this is a great idea and I wish it was done properly (ie. robust)
It kills UNIX in ths regard if it worked as spec'd
Linux is even worse that say SVR4 UNIX in instrumentation particularly I/O is zero!
BUT its
Ive written agent/server type stuff to get info and it aint to hard but you've got to have the info to begin with.
Moodss is a modular application. It displays data described and updated in one or more independent modules loaded when the application is started. Data is originally displayed in tables. Graphical views (graph, bar, 3D pie charts, ...), summary tables (with current,average, minimum and maximum values) and free text viewers can be created from any number of table cells, originating from any of the displayed viewers.
... Table rows can be sorted in increasing or decreasing order by clicking on column titles. The current configuration (modules, tables and viewers geometry, ...) can be saved in a file at any time, and later reused through a command line switch, thus achieving a dashboard functionality.
...) and since several modules can be loaded at once, applications for moodss become limitless. For example, comparing a remote database server CPU load and a network load from a probe on the same graph becomes possible.
...
... on my homepage at http://www.multimania.com/jfontain/
A thorough and intuitive drag'n'drop scheme is used for most viewer editing tasks: creation, modification, type mutation, destruction,
The module code is the link between the moodss core and the data to be displayed. All the specific code is kept in the module package. Since module data access is entirely customizable through C code, Tcl, HTTP,
Apart from a sample module with random data, ps, cpustats, memstats, diskstats, mounts, route, arp modules for Linux, apache and apachex modules are included (running "wish moodss ps cpustats memstats" mimics the "top" application with a graphic edge).
All the above in rpm, tgz,
Jean-Luc Fontaine
see screenshots, html documentation,
Regards.
According to what I read on /. every day, NT is absolutely terrible, and Linux is better in every way. So why would Linux want to copy NT's performance monitor? Surely Linux has a better performance monitor already included, since it's so much better than NT.
10 PRINT CHR$(205.5+RND(1)); : GOTO 10
sarcasm.
10 PRINT CHR$(205.5+RND(1)); : GOTO 10
I'm currently working on a tool that resembles what you're looking for. It consists of 3 parts and a kernel patch. The patch adds a feature to the kernel that enables a "trace" driver to register and enables key kernel parts to call upon the driver to note that a given event has occured. In turn, the trace module takes care of the events and puts them in a buffer. When a certain quantity of information is in the driver's buffer, he sends a signal to a trace daemon. The daemon then reads from the driver and appends the trace information to a trace file. The last part is the trace data decoder. This decoder takes the binary data and transforms it into a human-readable format. Therefore, impact on the system is minimized. As of this time, all the above mentionned parts are complete. The only thing that remains to be done is to build a GUI for the decoder (right now it works perfectly on command line). This is what I'm working on right now.
This system enables the observer to know exactly what happens at every moment in the system . As for remotley observing a host, this is not a problem, it is actually planned. This will consist in the trace daemon offering it's services on an IP port which can be contacted by other hosts. If you're interested ... send me an e-mail. I'll have a web page for it as soon as the GUI is complete.
Forgive me if any of these are obvious -- I'm
not trying to be sarcastic, it just that some
sys admins don't know this yet:
"top" shows you CPU load, memory usage, and
usage per process -- there are many options in
"top" check out the man page for it.
"pstree" shows a tree graph all processes and
who spawned what.
"ps auwx" (or "ps -ef" in Solaris) shows all
current processes
"netstat" and "ifconfig -a" shows network info
such as errors, dropped packets, etc.
Big Brother is a decent package for monitoring
several servers at once. It generates a web page
of colored lights (GIFs) indicating system load, web
daemon status, email daemon status, ftp daeomon
status, etc.
I'm guessing the author of the question (only two months wait? Wow, that new hardware really has sped things up.) has already perused the article on SCO releasing sar as open source. This sounds like what he needs, albeit in a command line only form (correct me if I'm wrong.)
Now what I want to see is a graphical heartbeat that looks a bit like cthugha (an oscilliscope on acid, for sound) and uses whatever system stats the operator deems apropriate as parameters for its graphics generating equations. Now, if only I had studied math a bit harder, I might write it myself... I have the Father Guido Sarducci "5 Minute University" syndrome, "We'll teach you in five minutes everything you'll remember five years after you graduate."
- None can love freedom heartily, but good men; the rest love not freedom, but license. -- John Milton
maybe I'm just too tired right now, but I did not detect any heisenberg ref's in the threads I've read here on this topic. Trying to instrument a thread of execution is usually death for that thread. I despise the notion of "performance monitors".
If you build it right, it will work, and any anomalies usually succumb to thoughtful analysis.
Instrumenting a network at the level you guys are talking about can kill it, severely.
Get ye a good sniffer, and learn to read it's utterances.
Brak: What's THAT?
Thundercleese: A light switch.. of TOTAL DEVASTATION!
In a slightly related note, Linux needs some new graphics libraries--GD is good, but it's not Excel. I have the distinct feeling GIMP
is better suited to what we need. Sooner or later we won't have to jump to Excel to get quality graphs drawn.
Uuuh, GNUPlot?
I've not looked at Guppi, but my instinct if people want flashier graphs than GNUPlot can produce, the way to go is to extend GNUPlot.
--
Set up a cron process to output the results of whatever /proc entries you desire into a CSV formatted logfile. Use the tools of your choice to sort through the mass of data. If you need something new, pop into the Linux source and add a /proc entry with whatever you like.
In a slightly related note, Linux needs some new graphics libraries--GD is good, but it's not Excel. I have the distinct feeling GIMP is better suited to what we need. Sooner or later we won't have to jump to Excel to get quality graphs drawn.
Once you pull the pin, Mr. Grenade is no longer your friend.
SAR was one of the few things I really like about working with a sun box.
IANALBIPOOGL (I am not a Lawyer, but I play one on GrokLaw.)
Of course, if you do go with the GIMP, you could get some seriously sharp looking graphs for your extra effort.
--
Fuck the system? Nah, you might catch something.
I've been playing around with MRTG for a couple weeks now, it's really geared towards monitoring routers, any other use is really kludgy.
What is this Crickett/RDD, I can't find mention of it with any web searches. Is it free? A web pointer please!
:g/Sun/s//SCO/g
I'm not sure that these lists will overlap too much; the list I started is focused on system-administrator level tools to monitor both the health and load of Linux systems. It's scope is larger than just performance tuning -- it's not only about tweaking systems to run benchmarks better, but about making sure you get notified when your systems go down.
It's not about duplication of effort -- it's about a different perspective and different goals.
To join this list, send a message consisting of the single word "subscribe" (in the message body , not the subject) to:
The first objective of this list is to gather enough information to build a performance and reliability HOWTO. Many of the attendees of the BOF are on this list. This list is still in its infancy, but I'm sure that the Slashdot effect will change that!Firstly if you are serious about monitoring performance, forget SNMP.
The best tool that I have ever seen is Performance Co-Pilot on SGIs. They recently demo'd this product at a Linux expo running on an SGI Visual Workstation running Linux and I believe they are heading towards open sourcing it (along with a lot of other SGI stuff).
See http://www.sgi.com/software/co-pilot
I have recently written my own tool for DEC Alphas, but it is primative compared to SGI's tool. Monitoring multiple hosts simultaneously in real-time on the same chart/3D visualisation is non-trivial.
My impression is that there is a good oppurtunity to add some good instrumentation to Linux using a consistent interface, someone has just got to do it. The other UNIXes suffer from insufficient instrumentation and a lack of public interfaces to get at the information.
There is folly and foolishness on the one side, and daring and calculation on the other. - Admiral Pellew, Hornblower
By The Way: If you have Tcl installed, you have to say "man 5 proc". Otherwise you get the manpage for proc(n): "Create a Tcl procedure".
My
While implementing a raid filesystem under Linux, I was shocked to find out that iostat does not work under linux. Earlier this morning Sun announced that they would be releasing sar under a Modzilla license so that should be helpful. I believe that most of the information you are looking for is tracked, ie top pulls information from somewhere and should give at least a high level overview about what is going on in the system.
Lando
/* TODO: Spawn child process, interest child in technology, have child write a new sig */
There already exists a linux performance mailing list.
/proc filesystem so let not reinvent all this stuff.
To subscribe to this list, send an email to the following address
with "subscribe" in the body of the message:
linux-perf-request@www-klinik.uni-mainz.de
My collegue, Rich Pettit has done alot of work to add perf stats to the
Seeing as someone already mentioned Datametrics
I might as well put in a (shameless) plug for our product, RAPS. Check out http://www.foglight.com for more details. Again, it's aimed at the enterprise level so it's not cheap but it has OS level monitoring as well as Sybase and Oracle agents, Netscape and Apache WWW server monitoring.
Anyone know of any tools to monitor both Linux and NT machines? I currently use Scotty, but NT's snmpd tells so little. I'd even be happy with an rstatd for NT.
He's talking about openwin perfmon. Which as the ability under Solaris to monitor all kinds of crap and even log said crap to a file, leaving the method of analysis up to your imagination.
Take a look at procinfo.
I think that's incorrect. The Performance Monitor measures at whatever increment you would like.
perfmon is actually an excellent and comprehensive utility that has some very nice features and is actually useful in the real world. I use it for database tuning, for example.
I recommend the book (now some years old) on using perfmon. In retrospect it looks like the last hurrah of the VMS crowd before the Win95 mentality took over NT development.
And the fact that perfmon gets no respect is Yet Another Reason Why
Bill Gates Is My Evil Twin.
I dunno....
*TOP* maybe?
pipe it to a file or something.
Blech. Signatures.
Yes I would. Why - would you not?!
Whatever it is, I hope it is smarter about sampling methods than perfmon.exe.
Perfmon only really performs spot measurements of things like CPU utilization. It can't tell you the true average CPU utilization of a process over a 10 minute interval. It can just tell you the average of instananeous CPU utilization at 0 minutes and 10 minutes. This bugs the hell out of me, especially since NT keeps a running count of execution time for each process.
Why don't you wait for the SAR sources to be
released and adopt it. SAR supposedly gives
a lot of low level statistics. If I remember
correctly, sometime back [ok, longtime back!]
when I was writing an SNMP agent for Acer
Server Manager, we used to make use of SARs
libraries on UnixWare to get some specific
statistics for instrumentation.
-Sas
The tool you're speaking of is the SE Toolkit, written by Adrian Cockroft for Sun. It's implemented in tcl/tk. It uses stuff like vmstat, iostat, mpstat to track system performance. It also uses some stuff in the Solaris kernel, so I don't see it being ported to Linux any time soon. It is very useful and does have its own language so you can write your own monitors, but its not very easy for someone who has a limited knowledge of the Solaris kernel (especially since the code isn't available!). I wish it were available for Linux!
I like IBMs Performance Toolbox.
The company I work for just got a liscense for raps (by foglight), and one of their reps said that they are planning on porting to Linux.
As it stands raps runs just on solaris (maybe nt?), and is very cool. I was skeptical at first, having seen a bunch f other crappy monitoring applications that I could write better myself, but this one really does a good job of presenting both low and high level stats in very digestible formats.
This was mentioned in the JavaOne conference yestereday. They mentioned that Rich Pettit was busily porting it to linux. it seems like a very usefull tool http://www.sun.com/sun-on-net/performance/se3/
I was poking around for similar utes today and ran across a rules based monitor... text out and looks REAL configurable... perfect for scheduling and collecting raw data for later analysis... check it out at http://www.blakeley.com/resources/ Please reply and let me know if this was any good. I may end up using it in the future and would appreciate and commentary you might have.
It makes me laugh to see this article appearing on the same page as this Ask /.
Although, I presume no-one's ported it to Linux yet.
I thought 'monitor' was freely available for most *nix platforms!?!?
Is this not true?
A year spent in artificial intelligence is enough to make one believe in God.
ummm, no. its a freely available tool, or maybe im hallucinating when i run monitor on solaris and irix.
my only uncertainty was whether or not there was a linux port....not whether i knew what monitor was....
A year spent in artificial intelligence is enough to make one believe in God.
Wow, "According to what I read on /. every day, NT is absolutely terrible, and Linux is better in every way.", that's really thinking for yourself isn't it?
All I have to say is that they have their merits. Would you go to a (example) christian conservative "Pro-Life" demonstration and just take everything they said as gospel??
Thanks for thinking,
Pad
Error Code: beef
C'mon, the whole "First Post" thing is pretty immature and on top of that, you didn't even get to it on time. Sheesh.
An application that pumps snmp and
Then you could use SQL to search for certain criteria/data at a given period in time on a given machine.
Does anything do this for Linux?
I wish I had the time to do this, i suppose it would be worthwhile doing!?!
Just ftp to your favourite metalab (sunsite) mirror. Do mget *.lsm in pub/Linux/system/status and pub/Linux/system/status/xstatus then have a look and see what sounds nice.
I use xperfmon++ for real-time monitoring - it looks a lot like the windows prefmon tool, but does not have the capability to log to file (this would probably run to only a few lines of code if you wanted to add it). However, almost all the stats can be got from vmstat(8) and netstat(8) -i. Just write a few filters to get rid the the headers and stuff from the output then paste(1) the results into one file and BINGO!
Rodd
Be careful. People in masks cannot be trusted.
FreeBSD has "perfmon", the CPU performance
:)
monitoring interface. A majority of the Linux
systems out there run on Intel CPUs. It is
not hard to implement an interface for
programming the CPU counters to measure
a variety of events. However, there needs
work to be done to use the so obtained data
in a meaningful manner. The CPU counters can
be coupled with various system tools that give
us system statistics. The bottomline:
1. There are a few drivers and libraries
for Linux that allow you to make use
of CPU counters (at least for the x86).
These don't seem to be much used.
2. Like Intel's VTune tool (which pertains
mainly to code optimization), a tool
could be written for Linux that gives
extensive performance statistics, and
helps in optimizing code. The
infrastructure for such a tool is already
there, IMHO.
3. For interested people, I could find
(from some dust-laden hard disks
pointers to related code and documents.
I worked on this subject long back.
I am curious as to what version of Linux and what GUI (if any) you are running. I had an older version of Linux running a year or so ago that had
an OpenWin GUI which came with everything you mentioned and more: disk usage, swapping, cpu, interrupts, etc. I had the same tools on my SPARC station under both OpenWin and CDE so I am assuming it comes with the GUI and not the OS.
Does this help?
putting the 'B' in LGBTQ+
I'm the one who posted this question [about two months ago! :O ], and have since then done some more research and asked around at the Performance BOF @ Linux Expo and didn't seem like there was anyting that providede me with everything I'm looking for. So...in the spirit of open source I've decided to write my own. The basic idea is to have an agent running on each machine you want to montor, and either a gtk or newt based UI on
the machine you're sitting at. Email me if you're interested in more info or helping.
- Deepak
Deepak Saxena
Project Director, Linux Demo Day '99
Deepak Saxena
"Computers are useless, they can only give you answers" - Picasso
maybe I'm just too tired right now, but I did not detect any heisenberg ref's in the threads I've read here on this topic. Trying to instrument a thread of execution is usually death for that thread. I despise the notion of "performance monitors". If you build it right, it will work, and any anomalies usually succumb to thoughtful analysis. Instrumenting a network at the level you guys are talking about can kill it, severely. Get ye a good sniffer, and learn to read it's utterances.
Intuitively it would seem that measuring threads in great detail would distort the measurement. However, performance registers on board CPUS (I'm thinking Intel at the moment) allow one to monitor certain aspects of threads with almost no overhead whatsoever. It's perfectly feasable and desirable to monitor code and a fine grained level.
There are many performance anomalies that can't necessarily be identified at the design level (for example pipeline flushing and cache problems). These detailed measurements can tell you much more about your program's behavior than the standard profiler can.
At present, there is a paucity of information collected by the kernel, especially for disk I/O. However, patches exist for local disks:
n g
4 .tgz
ftp://ftp.uk.linux.org/pub/linux/sct/fs/profili
and for nfs:
ftp://ftp.sce.carleton.ca/pub/rads/iostat-2.0.3
Please see the linux-perf mailing list for more information. Send subscribe requests to
linux-perf-request@www-klinik.uni-mainz.de
SNMP(Simple Network Monitoring Protocol) paired with something like MRTG or Crickett/RDD will do what you want.
MRTG
UCD SNMP for Linux
MRTG is kinda a bear to work with for monitoring stuff other than a router, but it can be done. For an example you can check out my suso.org stats page. Look on the left side.