Is Profiling Useless in Today's World?

← Back to Stories (view on slashdot.org)

Is Profiling Useless in Today's World?

Posted by CmdrTaco on Friday July 5, 2002 @06:27AM from the optimization-shmoptimization dept.

rngadam writes "gprof doesn't work in Linux multithreaded programs without a workaround that doesn't work that well. It seems that if you want to use profiling, you have to look for alternatives or agree with RedHat's Ulrich Drepper that "gprof is useless in today's world"... Is profiling useless? How do you profile your programs? Is the lack of good profiling tools under Linux leading us in a world of bloated applications and killing Linux adoption by the embedded developers? Or will the adoption of a LinuxThreads replacement solve our problems?"

8 of 221 comments (clear)

Profiling is Useful by Anonymous Coward · 2002-07-05 06:30 · Score: 5, Insightful

Maybe gprof, as an implementation might not be useful. But profiling, especially under Java, can make a world of different to an application.

Saying "profiling isn't useful" is similar to saying "having information isn't useful".

That's just dumb.
1. Re:Profiling is Useful by anonymous_wombat · 2002-07-05 07:02 · Score: 5, Informative
  
  In single threaded programs, just one type of profiling needs to be done, the kind that standard profiling tools measure. In multi-threaded programs, the relative execution times of the various threads may be more important. The first thing to do is to figure out which threads are using most of the resources. After this is done, and any optimizations made, the old-style profiling and optimizing of slow methods is just as important as ever. If your program is spending 80% of its time sorting, then optimize your sorting code.
  Of course, for many applications, multi-threading achieves the vast majority of the speed increase, and profiling will only be of marginal utility. The profiler is just one tool of many, and is not a silver bullet.
OProfile by mmontour · 2002-07-05 06:32 · Score: 5, Informative

Take a look at OProfile. It's quite a nice tool, although it's not a direct replacement for gprof. From their 'About' page:

OProfile is a system-wide profiler for Linux x86 systems, capable of profiling all running code at low overhead. OProfile is released under the GNU GPL.

It consists of a kernel module and a daemon for collecting sample data, and several post-profiling tools for turning data into information.

OProfile leverages the hardware performance counters of the CPU to enable profiling of a wide variety of interesting statistics, which can also be used for basic time-spent profiling. All code is profiled: hardware and software interrupt handlers, kernel modules, the kernel, shared libraries, and applications (the only exception being the oprofile interrupt handler itself).
Profiling will always be useful by Wesley+Everest · 2002-07-05 06:40 · Score: 5, Informative

I work as a game developer, and we have to make sure that everything that is done for each frame takes less than 33ms. So we're always profiling our code to cram more functionality into a limited amount of time.
But even if you aren't doing something that is speed intensive like games, you always have tradeoffs when you choose your data structures and algorithms. Generally you first code up the easiest algorithm that you think will use an acceptable amount of memory and CPU time. Then, later, if something is too slow, you have to identify where the problem is. If could be that you chose an O(N^2) algorithm not realizing that N might be 1,000 instead of the max of 100 you were counting on, forcing you to switch to an O(NlogN) algorithm that is more complex.
Now, if it is a small application, you might have enough familiarity with the code to be able to guess where the problem is -- then you fix it and see if it is still slow. If that works, then you're set and profiling isn't necessary. But if the fix doesn't speed it up enough, then you're stuck. You have to profile it somehow.
You might try simple tricks like changing the code to loop on a suspected bit of code 100 times and see how much longer it takes. Or maybe throw in some printf's that spit out the current time at different points. Or maybe create your own profiling code that you manually call in functions you want to time. Or, you might use an actual profiler without modifications to the code. But lacking a profiler doesn't mean you can't or won't profile your code.
And even with CPU speed doubling every couple of years or so, that doesn't mean speed is no longer an issue. You can easily choose the wrong algorithm and have something take 1000s of times longer to run than the proper algorithm.
Not useless by pthisis · 2002-07-05 06:52 · Score: 5, Insightful

Profiling in general certainly isn't useless. I'll usually write new code primarily in a high-level, high-productivity language (e.g. Python), and if it's too slow I'll profile it and rewrite applicable parts in C. Some projects require a lower level (C) approach from the start, though those are pretty rare. Without profiling you'll spend a lot of time optimizing code that isn't a bottleneck.

Remember the words of Knuth: "Premature optimization is the root of all evil." Without profiling, you don't know what optimization is really needed and what isn't.

That said...
BEGIN RANT
I've used gprof successfully with plenty of recent code. It works perfectly fine in non-threaded code, which _should_ be the majority (99%+) of code out there. Yes, that includes big network servers (the last one I wrote just recently passed the 6 billion requests served mark without blinking). Threads are a really nasty programming rathole that should be applied in a limited way; they take much of the time and effort spent developing protected memory OSes and toss it out the window. They also tend to encourage highly synchronized executions instead of decoupled execution, which often makes things both slower and more bug-prone (locking issues are _tough_ to get right when they become more than 1-level) and slower to implement than a well-designed multiprocess solution with an appropriate I/O paradigm. Just because two popular platforms (Windows and Java) make good non-threaded programming difficult doesn't mean you should cave in.
END RANT

--
rage, rage against the dying of the light
Re:I don't know... by pthisis · 2002-07-05 06:59 · Score: 5, Insightful

You could argue that with good up front design, you'll know in advance what 10% of the code to focus on, but I don't think that works that well in practice. At best, you're making educated guesses about where bottlenecks will appear

And a lot of smart people, from Knuth and Kernighan to Linus and Guido, will freely admit that predicting what to optimize is nearly impossible. Even people at that level of programming prowess are often surprised by where the bottlenecks appear (and where they don't appear). You certainly want to design for flexible optimization from the start, but you'll often discover that the stupid O(n) scan you put in is good enough for now and that you better optimize the I/O system before you think about replacing it with a tree or hash table or whatever.

Sumner

--
rage, rage against the dying of the light
Re:There is no question that profiling is necessar by pthisis · 2002-07-05 07:12 · Score: 5, Insightful

But, the bottom line is that if you don't profile your code (and unit test it, and integration test it, and...), you are not writing good code.

That's hardly true. Certainly you shouldn't waste time optimizing code until you know where the bottlenecks are. But it a lot of cases--I'd even venture to say most cases--code gets written and is fast enough. In such cases, profiling is a waste of time. Profiling is only indicated if there's a legitimate performance problem.

To a lesser extent, the same is true of unit testing and integration testing. If you're writing some code to convert one image to a GIF and you run it successfully to get the GIF, there's no reason to unit test. Even if the code has horrible bugs on some inputs, the job is done. One-off code isn't (unfortunately) uncommon. Prototype code is also very common and often you don't need to do extensive testing on it, either. Any code where the total cost of code failure is lower than the cost of QA probably doesn't need to be QA'd (which is not to say that you should spend an amount on QA equal to the failure cost; if spending $1000 on QA reduces the chance of failure by 99.999% and spending $1000000 reduces the chance of failure by 99.9999%, the $1000 expenditure suffices in all but the most demanding applications)

Sumner

--
rage, rage against the dying of the light
Re:So threads are evil -- now what? by pthisis · 2002-07-05 07:42 · Score: 5, Insightful

Okay, so let's say threads are evil.

Okay.

But processes as provided by current operating systems are too expensive to use.

No, they aren't. Have you measured fork() speeds under Linux vs. pthread_create() speeds()? Sure, Windows and Solaris blow at process creation (and Windows doesn't have a reasonable fork() alternative--it conflates fork() and exec() into CreateProcess*()), but that doesn't make all OSes brain-dead.

If I have a network server (e.g. a httpd) that has to create a process for each network request, it will never scale.

Right. And if you create a new thread for each network request, you'll never scale--give it a try some time. Good servers that use a thread/process for every connection do so with pre-fork()'d/pre-pthread_create()'d/whatever pools. Apache, for instance, uses multiple processes (but no multithreading, except in some builds of 2.x) but pre-forks a pool of them. This is really basic stuff, even an introductory threading book will talk about pooling and other server designs.

Really scalable/fast implementations don't even do that. They use just one process (or one per CPU) and multiplex the I/O with something like select, poll, queued realtime signals (Linux), I/O completion ports (NT), /dev/poll (Solaris), /dev/epoll, signal-per-fd, kqueues (FreeBSD), etc. (select and poll don't scale well to 10s of thousands of connections when most are idle, but some of the others are highly scalable). See e.g. Dan Kegel's c10k page for specifics.

Obviously, the OS needs to change, and give use something (maybe a hybrid between processes and threads) that more closely meets applications needs

http://www-124.ibm.com/pthreads/ proposes an M:N threading model and offers an implementation, but it still has the shared memory problems of threads. multiprocessing may not be sexy but it's really a lot cleaner for most problems and can be more efficient in a lot of domains.

Sumner

--
rage, rage against the dying of the light