Is Profiling Useless in Today's World?

← Back to Stories (view on slashdot.org)

Is Profiling Useless in Today's World?

Posted by CmdrTaco on Friday July 5, 2002 @06:27AM from the optimization-shmoptimization dept.

rngadam writes "gprof doesn't work in Linux multithreaded programs without a workaround that doesn't work that well. It seems that if you want to use profiling, you have to look for alternatives or agree with RedHat's Ulrich Drepper that "gprof is useless in today's world"... Is profiling useless? How do you profile your programs? Is the lack of good profiling tools under Linux leading us in a world of bloated applications and killing Linux adoption by the embedded developers? Or will the adoption of a LinuxThreads replacement solve our problems?"

10 of 221 comments (clear)

Min score:

Reason:

Sort:

OProfile by mmontour · 2002-07-05 06:32 · Score: 5, Informative

Take a look at OProfile. It's quite a nice tool, although it's not a direct replacement for gprof. From their 'About' page:

OProfile is a system-wide profiler for Linux x86 systems, capable of profiling all running code at low overhead. OProfile is released under the GNU GPL.

It consists of a kernel module and a daemon for collecting sample data, and several post-profiling tools for turning data into information.

OProfile leverages the hardware performance counters of the CPU to enable profiling of a wide variety of interesting statistics, which can also be used for basic time-spent profiling. All code is profiled: hardware and software interrupt handlers, kernel modules, the kernel, shared libraries, and applications (the only exception being the oprofile interrupt handler itself).
Profiling will always be useful by Wesley+Everest · 2002-07-05 06:40 · Score: 5, Informative

I work as a game developer, and we have to make sure that everything that is done for each frame takes less than 33ms. So we're always profiling our code to cram more functionality into a limited amount of time.
But even if you aren't doing something that is speed intensive like games, you always have tradeoffs when you choose your data structures and algorithms. Generally you first code up the easiest algorithm that you think will use an acceptable amount of memory and CPU time. Then, later, if something is too slow, you have to identify where the problem is. If could be that you chose an O(N^2) algorithm not realizing that N might be 1,000 instead of the max of 100 you were counting on, forcing you to switch to an O(NlogN) algorithm that is more complex.
Now, if it is a small application, you might have enough familiarity with the code to be able to guess where the problem is -- then you fix it and see if it is still slow. If that works, then you're set and profiling isn't necessary. But if the fix doesn't speed it up enough, then you're stuck. You have to profile it somehow.
You might try simple tricks like changing the code to loop on a suspected bit of code 100 times and see how much longer it takes. Or maybe throw in some printf's that spit out the current time at different points. Or maybe create your own profiling code that you manually call in functions you want to time. Or, you might use an actual profiler without modifications to the code. But lacking a profiler doesn't mean you can't or won't profile your code.
And even with CPU speed doubling every couple of years or so, that doesn't mean speed is no longer an issue. You can easily choose the wrong algorithm and have something take 1000s of times longer to run than the proper algorithm.
I used gprof by Zo0ok · 2002-07-05 06:40 · Score: 3, Informative

I used gprof quite much during my Master Thesis work this spring. gprof tells what functions consumes most cputime, and those functions could be optimised. Usually very small parts of the code consumes most of the cpu-time.

This program was parallellised on network level - all clients were singlethreaded. If someone has multithreaded for performance (to utilize more than one cpu) I suppose gprof will still work well on a single cpu machine with just one thread.

For programs that consumes lots of cpu time for well-defined computations it should not be hard to profile a single threaded version (a single threaded version is needed for debugging anyway).

More complex applications (for example a web browser) I imagine are more dependant on multi-threading, and should pose a larger problem.

gprof, is probably not dead - if you need it you can adapt the program...
Re:I don't know... by Wesley+Everest · 2002-07-05 06:51 · Score: 3, Informative

The flaw in your argument is that only a small portion of the code takes most of the time. If you spend a lot of time on upfront design instead of profiling, much of your effort will be wasted. 90% of the time you spend making your code fast should be spent on the 10% of the code that takes 90% of the CPU time. If you spread that out, you'll do a lot of unnecessary work speeding up code that rarely runs and have less time to optimize the code that running most of the time.
You could argue that with good up front design, you'll know in advance what 10% of the code to focus on, but I don't think that works that well in practice. At best, you're making educated guesses about where bottlenecks will appear, and you'll be wrong some of the time -- requiring profiling at that point.
And lacking tools doesn't mean you can't or won't profile -- it just means you'll have to do more work to profile the code.
VTune and Quantify by Codex+The+Sloth · 2002-07-05 06:53 · Score: 4, Informative

If you want tree profiling (i.e. information about function and child performence) then Rational Quantify is a reasonable alternative to the crap profiler that comes with MSDev.

If you want a flat profiler or need to analyze the cost of specific low level operations then you MUST get Intel VTune.

--
I am not a number! I am a man! And don't you ... oh wait, I'm #93427. Ha ha! In your face #93428!
ACE has the answer by Ricdude · 2002-07-05 06:59 · Score: 3, Informative

There is a simple profiling capability in the ACE toolkit, the ACE_Profile_Timer. Easy to wrap in a class with basic Start, Stop, and Elapsed methods. If you can guess what function or two the bulk of your program's time is being spent in, this can help pinpoint the worst offenders within that section of code. If not, create several timers, and time each function in your main loop, and print the information after the loop is finished. Drill down into subfunctions as needed. See where the milliseconds tick away. You might be surprised.
And remember, in the immortal words of Michael Abrash, "Assume Nothing. Measure the improvements. If you don't measure, you're just guessing."

--
How's my programming? Call 1-800-DEV-NULL
Re:Profiling is Useful by anonymous_wombat · 2002-07-05 07:02 · Score: 5, Informative

In single threaded programs, just one type of profiling needs to be done, the kind that standard profiling tools measure. In multi-threaded programs, the relative execution times of the various threads may be more important. The first thing to do is to figure out which threads are using most of the resources. After this is done, and any optimizations made, the old-style profiling and optimizing of slow methods is just as important as ever. If your program is spending 80% of its time sorting, then optimize your sorting code.
Of course, for many applications, multi-threading achieves the vast majority of the speed increase, and profiling will only be of marginal utility. The profiler is just one tool of many, and is not a silver bullet.
write event driven programs; threads for CPU work by Splork · 2002-07-05 07:46 · Score: 4, Informative

minimize the use of threads whenever possible. write your code in an event driven fashion as your friendly AC suggested. the poll() system call [superior to select(), though select() works well within its fixed size filedescriptor array limits] makes this possible.

the basic mentality to switch from threads to event programming is this: anytime you're using a thread solely so that it can sit around and block on high latency events (network or disk I/O) most of its lifetime, it should not be a thread.

its acceptable to have worker threads/processes that you hand computational tasks to and they trigger an event in your event loop when they hand a result back, but don't use threads of execution to manage your state. you'll pull your hair out and still have a nonfunctional program.
Re:'pstack' on Solaris by WolfWithoutAClause · 2002-07-05 08:00 · Score: 3, Informative

Yes, the company I work for used this technique to build up gprof style call trace information on a huge embedded, persistent, realtime, multitasking, concurrent system we built (yes it is/was horrible ;-).
Anyway, we ran the equivalent of pstack at frequent intervals (like once per millisecond) and then collected the addresses of all functions in the call tree present each time we polled the system. Got a humongous file. Then postprocessed the file to record which functions called what other functions, and how often and looked up the addresses in the symbol table to give usable names.
It turns out that polling the system like that usually gives all the important information you could want- it tends to show not the most called functions but the heaviest users of the processor because they are much more likely to be running when the pstack happens- the number of times they will appear is proportional to the total time they run for, statistically. And the technique is minimally invasive and doesn't require recompilation of the code under test.
Then we printed the summary out in a huge printout, each function sorted by percentage ticks spent in it; and then spent a week or two staring at it. It showed some amazing features like certain functions were spending an order of magnitude longer in them than originally designed, that kind of thing.
It is really quite a useful technique.

--
-WolfWithoutAClause
"Gravity is only a theory, not a fact!"
OProfile + Prospect by irix · 2002-07-05 08:24 · Score: 4, Informative

And for getting even more useful information out, try Prospect. It works with OProfile - there was a talk about it at this year's Ottawa Linux Symposium, which you can find in the conference proceedings (gzipped PDF).

--

Do you even know anything about perl? -- AC Replying to Tom Christiansen post.