Slashdot Mirror


Is Profiling Useless in Today's World?

rngadam writes "gprof doesn't work in Linux multithreaded programs without a workaround that doesn't work that well. It seems that if you want to use profiling, you have to look for alternatives or agree with RedHat's Ulrich Drepper that "gprof is useless in today's world"... Is profiling useless? How do you profile your programs? Is the lack of good profiling tools under Linux leading us in a world of bloated applications and killing Linux adoption by the embedded developers? Or will the adoption of a LinuxThreads replacement solve our problems?"

18 of 221 comments (clear)

  1. OProfile by mmontour · · Score: 5, Informative

    Take a look at OProfile. It's quite a nice tool, although it's not a direct replacement for gprof. From their 'About' page:

    OProfile is a system-wide profiler for Linux x86 systems, capable of profiling all running code at low overhead. OProfile is released under the GNU GPL.

    It consists of a kernel module and a daemon for collecting sample data, and several post-profiling tools for turning data into information.

    OProfile leverages the hardware performance counters of the CPU to enable profiling of a wide variety of interesting statistics, which can also be used for basic time-spent profiling. All code is profiled: hardware and software interrupt handlers, kernel modules, the kernel, shared libraries, and applications (the only exception being the oprofile interrupt handler itself).

  2. 'pstack' on Solaris by wavq · · Score: 2, Informative

    While it doesn't give the exact time spent in a
    given function, running 'pstack' against a
    processID under Solaris will give the execution
    stack trace of any threads present.

    If you find that 80% of your threads are in
    slow_function( someParam ) then ya better get to
    work fixing it. This also has the added advantage
    of not slowing down your program with profiling
    code and other hooks.

    Obviously this isn't great for fine-grained
    profiling, or with applications with few threads,
    but I've found it helpful on my larger projects.

    1. Re:'pstack' on Solaris by WolfWithoutAClause · · Score: 3, Informative
      Yes, the company I work for used this technique to build up gprof style call trace information on a huge embedded, persistent, realtime, multitasking, concurrent system we built (yes it is/was horrible ;-).

      Anyway, we ran the equivalent of pstack at frequent intervals (like once per millisecond) and then collected the addresses of all functions in the call tree present each time we polled the system. Got a humongous file. Then postprocessed the file to record which functions called what other functions, and how often and looked up the addresses in the symbol table to give usable names.

      It turns out that polling the system like that usually gives all the important information you could want- it tends to show not the most called functions but the heaviest users of the processor because they are much more likely to be running when the pstack happens- the number of times they will appear is proportional to the total time they run for, statistically. And the technique is minimally invasive and doesn't require recompilation of the code under test.

      Then we printed the summary out in a huge printout, each function sorted by percentage ticks spent in it; and then spent a week or two staring at it. It showed some amazing features like certain functions were spending an order of magnitude longer in them than originally designed, that kind of thing.

      It is really quite a useful technique.

      --

      -WolfWithoutAClause

      "Gravity is only a theory, not a fact!"
  3. gprof far from useless by tps12 · · Score: 1, Informative

    First, a little background on gprof, for those new to the *n*x world. Gprof is what's known as a profiler. Basically, it inserts code into the beginnings and ends of functions. When you run your program through gprof, it then records how much time is spent in each part of your program. The idea is that the programmer sees where most of the time is being spent, and optimizes that part of the program.

    Now, as for the charges of gprof being useless, I can say that that is far from the case. True, it falls flat when dealing with multithreaded programs. But in practice, multithreaded programs are almost always interactive, and thus are primarily limited by user response times, which are many orders of magnitude longer than even the worst algorithm. In these cases, reducing the amount of input required from the user will always pay off better than any optimizations.

    As an example, in our enterprise database frontend, we had a dialog that would prompt users for an administrator password when they attempted the "delete" command. We did analysis (with a commercial profiler, but it may as well have been gprof) and found that, lo and behold, the bulk of execution time was being spent waiting for the user to type in the password. So what we did was change the delete command to "eteled" ("delete" backwards), and only told the administrators the new command name. This way, we could be certain that only administrators would even attempt a deletion, and no password prompt was necessary. We have since applied the same design philosophy throughout our software, and productivity is at an all-time high.

    As is usually the case, profiling can be the most important part of a project or next to useless. It all depends on how you use it. Gprof is a great tool for what it does; you just have to know how to use it properly.

    --

    Karma: Good (despite my invention of the Karma: sig)
  4. Profiling will always be useful by Wesley+Everest · · Score: 5, Informative
    I work as a game developer, and we have to make sure that everything that is done for each frame takes less than 33ms. So we're always profiling our code to cram more functionality into a limited amount of time.

    But even if you aren't doing something that is speed intensive like games, you always have tradeoffs when you choose your data structures and algorithms. Generally you first code up the easiest algorithm that you think will use an acceptable amount of memory and CPU time. Then, later, if something is too slow, you have to identify where the problem is. If could be that you chose an O(N^2) algorithm not realizing that N might be 1,000 instead of the max of 100 you were counting on, forcing you to switch to an O(NlogN) algorithm that is more complex.

    Now, if it is a small application, you might have enough familiarity with the code to be able to guess where the problem is -- then you fix it and see if it is still slow. If that works, then you're set and profiling isn't necessary. But if the fix doesn't speed it up enough, then you're stuck. You have to profile it somehow.

    You might try simple tricks like changing the code to loop on a suspected bit of code 100 times and see how much longer it takes. Or maybe throw in some printf's that spit out the current time at different points. Or maybe create your own profiling code that you manually call in functions you want to time. Or, you might use an actual profiler without modifications to the code. But lacking a profiler doesn't mean you can't or won't profile your code.

    And even with CPU speed doubling every couple of years or so, that doesn't mean speed is no longer an issue. You can easily choose the wrong algorithm and have something take 1000s of times longer to run than the proper algorithm.

  5. I used gprof by Zo0ok · · Score: 3, Informative

    I used gprof quite much during my Master Thesis work this spring. gprof tells what functions consumes most cputime, and those functions could be optimised. Usually very small parts of the code consumes most of the cpu-time.

    This program was parallellised on network level - all clients were singlethreaded. If someone has multithreaded for performance (to utilize more than one cpu) I suppose gprof will still work well on a single cpu machine with just one thread.

    For programs that consumes lots of cpu time for well-defined computations it should not be hard to profile a single threaded version (a single threaded version is needed for debugging anyway).

    More complex applications (for example a web browser) I imagine are more dependant on multi-threading, and should pose a larger problem.

    gprof, is probably not dead - if you need it you can adapt the program...

  6. Re:I don't know... by Wesley+Everest · · Score: 3, Informative
    The flaw in your argument is that only a small portion of the code takes most of the time. If you spend a lot of time on upfront design instead of profiling, much of your effort will be wasted. 90% of the time you spend making your code fast should be spent on the 10% of the code that takes 90% of the CPU time. If you spread that out, you'll do a lot of unnecessary work speeding up code that rarely runs and have less time to optimize the code that running most of the time.

    You could argue that with good up front design, you'll know in advance what 10% of the code to focus on, but I don't think that works that well in practice. At best, you're making educated guesses about where bottlenecks will appear, and you'll be wrong some of the time -- requiring profiling at that point.

    And lacking tools doesn't mean you can't or won't profile -- it just means you'll have to do more work to profile the code.

  7. VTune and Quantify by Codex+The+Sloth · · Score: 4, Informative

    If you want tree profiling (i.e. information about function and child performence) then Rational Quantify is a reasonable alternative to the crap profiler that comes with MSDev.

    If you want a flat profiler or need to analyze the cost of specific low level operations then you MUST get Intel VTune.

    --
    I am not a number! I am a man! And don't you ... oh wait, I'm #93427. Ha ha! In your face #93428!
  8. ACE has the answer by Ricdude · · Score: 3, Informative
    There is a simple profiling capability in the ACE toolkit, the ACE_Profile_Timer. Easy to wrap in a class with basic Start, Stop, and Elapsed methods. If you can guess what function or two the bulk of your program's time is being spent in, this can help pinpoint the worst offenders within that section of code. If not, create several timers, and time each function in your main loop, and print the information after the loop is finished. Drill down into subfunctions as needed. See where the milliseconds tick away. You might be surprised.

    And remember, in the immortal words of Michael Abrash, "Assume Nothing. Measure the improvements. If you don't measure, you're just guessing."

    --
    How's my programming? Call 1-800-DEV-NULL
  9. Re:Profiling is Useful by anonymous_wombat · · Score: 5, Informative
    In single threaded programs, just one type of profiling needs to be done, the kind that standard profiling tools measure. In multi-threaded programs, the relative execution times of the various threads may be more important. The first thing to do is to figure out which threads are using most of the resources. After this is done, and any optimizations made, the old-style profiling and optimizing of slow methods is just as important as ever. If your program is spending 80% of its time sorting, then optimize your sorting code.

    Of course, for many applications, multi-threading achieves the vast majority of the speed increase, and profiling will only be of marginal utility. The profiler is just one tool of many, and is not a silver bullet.

  10. CPU Intensive tasks by Anonymous Coward · · Score: 1, Informative
    When you have to deal with CPU intensive programs,
    you will find that a profiller is quite useful.

    For instance, in the last year I've been developing a automatic nesting server using Linux and gprof was very important to spot the functions that were consuming more cpu time.

    With gprof it was easy to notice two small functions that were responsible for 95% of the cpu usage.

    As a result, I replaced that two small functions with 180 lines of optimized assembly code and I got a very good performance increase, since I was using a lot of inter-word bit shifts that the C compiler didn't handle well.

    Regarding multi-threads, I come to the conclusion that 9 out of 10 times you don't really need to use threads, even in interactive programs, since there are alternative ways of acheiving the same efects.

    For instance, all the X11 toolkits like Xt/Motif, Gtk+ and Qt, have the concept of work-procedures and timeout-funcions.

    If you put all of your time-consuming operations inside work-procedures, you can get the same results as you would get with multi-threads, because you have an efective way of executing several taks at the same time without blocking the user interface.

    Fernando Pereira

  11. Plenty of options for Java by grungeman · · Score: 2, Informative

    For Java we have a really nice choice of profilers. There are basically three great products available, all of them have proved to be absolutely useful. There is JProbe, OptimizeIt and JProfiler (the 2.0 beta of JProfiler looks cool). I don't know what the problems on Linux are, but when programming Java, profiling is quite an enjoyable task.

    --

    Signature deleted by lameness filter.
  12. write event driven programs; threads for CPU work by Splork · · Score: 4, Informative

    minimize the use of threads whenever possible. write your code in an event driven fashion as your friendly AC suggested. the poll() system call [superior to select(), though select() works well within its fixed size filedescriptor array limits] makes this possible.

    the basic mentality to switch from threads to event programming is this: anytime you're using a thread solely so that it can sit around and block on high latency events (network or disk I/O) most of its lifetime, it should not be a thread.

    its acceptable to have worker threads/processes that you hand computational tasks to and they trigger an event in your event loop when they hand a result back, but don't use threads of execution to manage your state. you'll pull your hair out and still have a nonfunctional program.

  13. Re:Linux pthreads breaks lots of things by Anonymous Coward · · Score: 1, Informative
    Core dump? Almost useless with pthreads running
    I believe there is something in the -ac tree whereby multiple core files are dumped for threaded applications.

    Also, recently merged in 2.4.x is "next-generation threads". I'm not sure if glibc linuxthreads will utilize them, or if they're any better, but just letting everyone know the implementation might change.
  14. Re:Profiling *IS* useful by Anonymous Coward · · Score: 1, Informative

    If Ulrich had written that profiling was useless, your response might make some sense. He didn't. He wrote that gprof was useless. See the difference?

  15. OProfile + Prospect by irix · · Score: 4, Informative

    And for getting even more useful information out, try Prospect. It works with OProfile - there was a talk about it at this year's Ottawa Linux Symposium, which you can find in the conference proceedings (gzipped PDF).

    --

    Do you even know anything about perl? -- AC Replying to Tom Christiansen post.
  16. No, take a look at FunctionCheck by Pornosonic · · Score: 2, Informative
    Take a look at FunctionCheck

    Five bucks says that this server is slashdot'ed within the hour, so you may have more success with the less descriptive SourceForge project page, indicates that the project is not dead, as the homepage says.

    I discovered this program when I was optimizing some code I wrote to multiply sparse matrices. By the time I had gotten it 100x faster than the initial code, gprof had lost all semblance of granularity and was giving me obviously bogus results. The problem is that such things as cache performance (i.e. optimizing for cache hits) were now heavily affecting the profile and gprof could not figure such things out. FunctionCheck works much better than gprof and actually generates accurate profile information under high-stress situations.

    From the homepage (all grammatical errors theirs):

    "I created FunctionCheck because the well known profiler gprof have some limitations:

    • it is not possible to change the profile data file name
    • multi-threads / multi-processes is not supported
    • time spend in non-profiled functions is discarded
    • you can't control the way profile is made
    • memory profile is not managed
    For all these limitations, and by the fact that I discovered a new gcc feature called -finstrument-functions, I decided to write my own profiler.

    My approach is simple: I add (small) treatments at each enter and exit of all the functions of the profiled program. It allows me to compute many information:

    • the current call-stack
    • the time at each action, to compute elapsed times in functions
    • process PID / thread ID, to manage multi-threads / multi-processes
    • number of calls to functions
    • ...
    With these information, I can generate profile data files (for each thread / process), which describes all the statistics (at function level) for the program execution."

    Try it out and please contribute some source code.

  17. tsprof: process profiling on Linux/x86 by jreiser · · Score: 2, Informative

    See http://www.BitWagon.com/tsprof/tsprof.html for info on a process profiler that uses hardware performance counters (with no recompile and no relink) and gives both interactive and text output in tree and flat modes.