Slashdot Mirror


Is Profiling Useless in Today's World?

rngadam writes "gprof doesn't work in Linux multithreaded programs without a workaround that doesn't work that well. It seems that if you want to use profiling, you have to look for alternatives or agree with RedHat's Ulrich Drepper that "gprof is useless in today's world"... Is profiling useless? How do you profile your programs? Is the lack of good profiling tools under Linux leading us in a world of bloated applications and killing Linux adoption by the embedded developers? Or will the adoption of a LinuxThreads replacement solve our problems?"

221 comments

  1. Profiling Again? by stirfry714 · · Score: 4, Funny

    Why can't my code be judged by the content of its characters, and not by the color of its extension?

    Down with profiling! :)

  2. The headline makes it sound like... by Anixamander · · Score: 0, Offtopic

    a yro article.

    My first instinct was to agree with the headline, since Jose Padilla is hispanic. It would have at least made for a more interesting discussion.

    --
    Do not taunt Happy Fun Ball(TM)
  3. Profiling is Useful by Anonymous Coward · · Score: 5, Insightful

    Maybe gprof, as an implementation might not be useful. But profiling, especially under Java, can make a world of different to an application.

    Saying "profiling isn't useful" is similar to saying "having information isn't useful".

    That's just dumb.

    1. Re:Profiling is Useful by Westley · · Score: 1

      Absolutely. I draw the reader's attention to the difference between the headline ("Is profiling useless in today's world?") with the text which is *only* talking about native profiling in Linux.

      Not everyone, not even all /. readers, just write code which is just compiled to Linux native code.

      Jon

    2. Re:Profiling is Useful by Anonymous Coward · · Score: 0
      duh. Too bad CmdrTaco isn't half as insighful as you. The problem isn't that profiling is useless. The problem is that gprof, the open sores gnu profiler, doesn't work with multi-threaded applications.

      I'm sure we'll hear the usual "it's open source, fix it yourself" line...

    3. Re:Profiling is Useful by Anonymous Coward · · Score: 3, Funny

      Most, if not all, ./ readers have never written a line of code more involved than 10 print "hello". They spend their time trying new enlightnement and gnome themes and rebooting into windows 98 to play games and post to slashdot (since they can't figure out how to configure pppd).

    4. Re:Profiling is Useful by Anonymous Coward · · Score: 0

      lmfao

    5. Re:Profiling is Useful by anonymous_wombat · · Score: 5, Informative
      In single threaded programs, just one type of profiling needs to be done, the kind that standard profiling tools measure. In multi-threaded programs, the relative execution times of the various threads may be more important. The first thing to do is to figure out which threads are using most of the resources. After this is done, and any optimizations made, the old-style profiling and optimizing of slow methods is just as important as ever. If your program is spending 80% of its time sorting, then optimize your sorting code.

      Of course, for many applications, multi-threading achieves the vast majority of the speed increase, and profiling will only be of marginal utility. The profiler is just one tool of many, and is not a silver bullet.

    6. Re:Profiling is Useful by Anonymous Coward · · Score: 0

      I'd say that you're right on...
      except that's most 'linux users', not most /. readers...
      Linux is my bitch.

    7. Re:Profiling is Useful by Anonymous Coward · · Score: 0

      ...and some put "Linux" (or even UNIX) on their resume because they installed linux a year ago but removed it after a week.

      Sadly, this actually happens.

    8. Re:Profiling is Useful by ChrisEmpson · · Score: 1

      I found profiling with gprof a tremendous help when optimising a computational chemistry program written as part of my master's degree project. I found that one particular function was occupying 98% of the computation time; some careful loop optimisations here and there in that function produced a massive 40% performance gain. Useless my arse!

      Chris Empson

  4. j00g by VAXGeek · · Score: 0, Offtopic

    i racially profile my programs.

    --
    this sig limit is too small to put anything good h
    1. Re:j00g by Anonymous Coward · · Score: 0

      %DCL-W-LAMER, Lame comment - check previous posts

  5. Ulrich Drepper by quigonn · · Score: 2, Insightful

    Ulrich Drepper is a fool, he made glibc crappy, and messed up most things he had to do with. He simply should shut up and let other people do the work and the thinking.

    Yeah, mod me down, but I have insight into the things Ulrich does, and he mostly does sh*t. Just my 2 cents (USD or EUR, you decide).

    --
    A monkey is doing the real work for me.
    1. Re:Ulrich Drepper by ocie · · Score: 0, Flamebait

      It's free software. Stop bitching and complaining about the way Ulrich did it. If you have a better way, write it up and it will probably be adopted. If not, STFU!

      --
      JET Program: see Japan, meet intere
    2. Re:Ulrich Drepper by Anonymous Coward · · Score: 0

      oooh, good comeback. Did you ask your mommy for help?

      Face it dork, you couldn't flame with napalm and a zippo.

    3. Re:Ulrich Drepper by Anonymous Coward · · Score: 0

      I choose euros.

    4. Re:Ulrich Drepper by Anonymous Coward · · Score: 0

      details please?

    5. Re:Ulrich Drepper by Panaflex · · Score: 2

      I found that Ulrich is pretty easy to work with... if you have a clue. glibc is NO easy task (99.99 % of all programs depend on this library.. now make it feature rich and compatible).

      I tracked down a bug in __fsetlocking and he was most helpful in fixing glibc.

      Pan

      --
      I said no... but I missed and it came out yes.
    6. Re:Ulrich Drepper by Anonymous Coward · · Score: 0

      I've never been good at flaming. It's some sort of medical condition related to having a penis more than 2" long.

    7. Re:Ulrich Drepper by Anonymous Coward · · Score: 0

      About the only insight I can gather from this post is that you're a rather unpleasant person and that you don't like Ulrich Drepper.

    8. Re:Ulrich Drepper by Lozzer · · Score: 1

      Yeah, I reported a bug when compiling Mozilla with -O3, and Ulrich had a fix pretty quickly, I can't say anything for how good an architect he may or may not be, but he seems to have a good handle on the development side of things.

      --
      Special Relativity: The person in the other queue thinks yours is moving faster.
    9. Re:Ulrich Drepper by Anonymous Coward · · Score: 0
      You are wrong. The guy who wrote the original glibc (1.x) was a fool It was riddled with non-trivial bugs, and he was a real nasty guy to work with. Oh, and it was full of all those dopey obscure coding "tricks" that C cowboys of yore were so fond of. And it was written in "traditional" non-ansi C, the favorite language of koding kowboys.

      Compared to what went before, Ulrich Drepper is a saint.

    10. Re:Ulrich Drepper by quigonn · · Score: 2

      You don't understand. glibc itself is flaUUlwed, badepper . The source distribution contains Linux header files (yes, from the kernel, Linus said ts is evil), and contains code just to copy some kernel data structure to glibc's own data structures, because glibc has some super fancy features that everybody needs (not really, actually). In other words: glibc is bloat. Ulrich Drepper doesn't care about this, and packs every feature into glibc he can think of. Other people proved that C libraries can be done much smaller, uClibc for example, or, my personal favorite, diet libc. The interesting thing about this is that programs that are _statically_ linked against diet libc are usually smaller than programs that are dynamically linked against glibc.

      --
      A monkey is doing the real work for me.
    11. Re:Ulrich Drepper by Anonymous Coward · · Score: 0

      Ulrich writes ugly code, too. Trust me, read glibc. Play around with this bitch for some time. And you will start hating glibc and Ulrich.

    12. Re:Ulrich Drepper by Anonymous Coward · · Score: 0

      Your web site says your name is Alexander Bartolich, you suck.

    13. Re:Ulrich Drepper by Anonymous Coward · · Score: 0

      diet libc is GPL'd?!?! Not under the LGPL license? Jesus Christ, and people still use it for their C library?

    14. Re:Ulrich Drepper by Anonymous Coward · · Score: 0

      I suspect that trying to "make it feature rich and compatible" is the whole problem.

  6. I don't know... by gatkinso · · Score: 1

    ...I think that the lack of profiling tools makes developers rely more heavily on solid upfront design.

    --
    I am very small, utmostly microscopic.
    1. Re:I don't know... by Wesley+Everest · · Score: 3, Informative
      The flaw in your argument is that only a small portion of the code takes most of the time. If you spend a lot of time on upfront design instead of profiling, much of your effort will be wasted. 90% of the time you spend making your code fast should be spent on the 10% of the code that takes 90% of the CPU time. If you spread that out, you'll do a lot of unnecessary work speeding up code that rarely runs and have less time to optimize the code that running most of the time.

      You could argue that with good up front design, you'll know in advance what 10% of the code to focus on, but I don't think that works that well in practice. At best, you're making educated guesses about where bottlenecks will appear, and you'll be wrong some of the time -- requiring profiling at that point.

      And lacking tools doesn't mean you can't or won't profile -- it just means you'll have to do more work to profile the code.

    2. Re:I don't know... by pthisis · · Score: 5, Insightful

      You could argue that with good up front design, you'll know in advance what 10% of the code to focus on, but I don't think that works that well in practice. At best, you're making educated guesses about where bottlenecks will appear

      And a lot of smart people, from Knuth and Kernighan to Linus and Guido, will freely admit that predicting what to optimize is nearly impossible. Even people at that level of programming prowess are often surprised by where the bottlenecks appear (and where they don't appear). You certainly want to design for flexible optimization from the start, but you'll often discover that the stupid O(n) scan you put in is good enough for now and that you better optimize the I/O system before you think about replacing it with a tree or hash table or whatever.

      Sumner

      --
      rage, rage against the dying of the light
    3. Re:I don't know... by SirSlud · · Score: 2

      As the other replies pointed out, once a program reaches a certain lvl of complexity, all the design in the world couldn't prodict what parts of your app youll need to optimize. With locks, mutexes, inturrupts, etc flying about your system in a multithreaded app, you can _design_ upfront, but you can't _optimize_ upfront.

      Anyhow, whats the definition of optimize ... make something better _than it already is_. Kinda hard to optimize something that doesn't exist. You're first 'optimized' version of something you designed well up front isn't optimized for the obvious reason that you havn't actually optimized it yet! ;)

      --
      "Old man yells at systemd"
    4. Re:I don't know... by fermion · · Score: 3, Insightful
      The flaw in your argument is that only a small portion of the code takes most of the time. If you spend a lot of time on upfront design instead of profiling, much of your effort will be wasted.

      Wrong. You design your code as a compromise between factors such as speed, maintainance, reusability, readability, and, most importantly, the resources you are allowed to expend.

      If speed is a critical factor, then you might try to do some predictive profiling using exisiting principles to make sure the code is fast. Otherwise, you write the best damn code you can, which generally means using good practices to insure that you don't waste time, and then profile it. Profiling will work best if the code is written is such a way(read a lot of reusabled functions) that allows simple optimization.

      BTW, the biggest wrinkle in this is that programmers time has become more valuable the clock cycles. We will now waste some clock cycles to same programmers time, which is why profiling is not nearly as important as it used to be.

      If the code is not written well, and has to be rewritten when the profiler says it sucks, then you wasted your time.

      --
      "She's a scientist and a lesbian. She's not going to let it slide." Orphan Black
    5. Re:I don't know... by gatkinso · · Score: 1

      I agree - I was just being a clown by responding in a serious manner to the creep who posted the pr0n.

      --
      I am very small, utmostly microscopic.
    6. Re:I don't know... by Wesley+Everest · · Score: 2
      I wasn't arguing against up front design. I was just saying that if your plan is to avoid all profiling by spending additional time in your design with an eye to speed, it won't work. If speed is important, you'll most likely end up doing profiling regardless how much time you spend on design. And if your plan on minimizing that posibility is to spend extra time optimizing all your code, then most of that effort is wasted since most of the code isn't the bottleneck.

      I agree completely that good design and good coding practices will save time when it comes time for profiling and optimizations.

  7. It depends on your goals by Anonymous Coward · · Score: 0

    If your goals are speed/efficiency, then profiling is a necessary step. If your goal is just to get the program done and working, then you may be able to survive without it. People shouldn't just give up on the speed/efficiency just because they figure the newest technology will handle it fine... unless you're Microsoft.

  8. OProfile by mmontour · · Score: 5, Informative

    Take a look at OProfile. It's quite a nice tool, although it's not a direct replacement for gprof. From their 'About' page:

    OProfile is a system-wide profiler for Linux x86 systems, capable of profiling all running code at low overhead. OProfile is released under the GNU GPL.

    It consists of a kernel module and a daemon for collecting sample data, and several post-profiling tools for turning data into information.

    OProfile leverages the hardware performance counters of the CPU to enable profiling of a wide variety of interesting statistics, which can also be used for basic time-spent profiling. All code is profiled: hardware and software interrupt handlers, kernel modules, the kernel, shared libraries, and applications (the only exception being the oprofile interrupt handler itself).

  9. -Xprof and -Xrunhprof work fine for me... by Curt+Cox · · Score: 1

    but I suppose other people want to profile more than just Java. Bother.

  10. 'pstack' on Solaris by wavq · · Score: 2, Informative

    While it doesn't give the exact time spent in a
    given function, running 'pstack' against a
    processID under Solaris will give the execution
    stack trace of any threads present.

    If you find that 80% of your threads are in
    slow_function( someParam ) then ya better get to
    work fixing it. This also has the added advantage
    of not slowing down your program with profiling
    code and other hooks.

    Obviously this isn't great for fine-grained
    profiling, or with applications with few threads,
    but I've found it helpful on my larger projects.

    1. Re:'pstack' on Solaris by WolfWithoutAClause · · Score: 3, Informative
      Yes, the company I work for used this technique to build up gprof style call trace information on a huge embedded, persistent, realtime, multitasking, concurrent system we built (yes it is/was horrible ;-).

      Anyway, we ran the equivalent of pstack at frequent intervals (like once per millisecond) and then collected the addresses of all functions in the call tree present each time we polled the system. Got a humongous file. Then postprocessed the file to record which functions called what other functions, and how often and looked up the addresses in the symbol table to give usable names.

      It turns out that polling the system like that usually gives all the important information you could want- it tends to show not the most called functions but the heaviest users of the processor because they are much more likely to be running when the pstack happens- the number of times they will appear is proportional to the total time they run for, statistically. And the technique is minimally invasive and doesn't require recompilation of the code under test.

      Then we printed the summary out in a huge printout, each function sorted by percentage ticks spent in it; and then spent a week or two staring at it. It showed some amazing features like certain functions were spending an order of magnitude longer in them than originally designed, that kind of thing.

      It is really quite a useful technique.

      --

      -WolfWithoutAClause

      "Gravity is only a theory, not a fact!"
    2. Re:'pstack' on Solaris by irix · · Score: 2

      On Solaris, I find pstack useful for debugging, but it really isn't useful for profiling. For that, I would use either truss or the profiling tools that some with Sun Workshop 6 (if you are using Sun cc/CC).

      --

      Do you even know anything about perl? -- AC Replying to Tom Christiansen post.
    3. Re:'pstack' on Solaris by Anonymous Coward · · Score: 0
      'truss' is almost useless as a profiling tool. It's more of a 'black box' debugging tool in that you can't see what's going on inside your process, you can only see the kernel traps the process generates. In other words, you really can't tell who's calling that lseek() 15,239 times...

      'whocalls' (IIRC - I don't have access to Solaris right now) is much better if you have some idea of what's going on.

  11. Hell, yes it's useful by PissingInTheWind · · Score: 3, Insightful
    Maybe the problems with today's profiler is that the compiler implementors spend too much time making a compiler that is going to try to optimize everything by itself, which then might not even get the best code in that case.

    What could be more useful is if the compiler implementor would spend as much time on the profiler than on the compiler: you would then be able to easily see faulty parts in your software and be able to determine what needs to be optimized.

    Good profilers would means efficient code. Don't think profilers are useless because most implementations of them sucks.

    --

    A message from the system administrator: 'I've upped my priority. Now up yours.'
    1. Re:Hell, yes it's useful by maxwell+demon · · Score: 4, Insightful
      While imroved profilers would surely be useful, don't think optimizing compilers are useless.
      • Hand-optimized code tends to be less clear and less readable. Also, it makes it easy for new bugs to creep in.
      • Hand-optimized code would be machine-specific. While it would work on other machines, it would be dog slow there. So you'd basically be back to per-architecture versions of your program.
      • Some optimizations cannot be done by the programmer, because they ocur at levels below the language. For example, the POWER architecture has a multiply+add instruction. Most common programming languages don't have a multiply+add command. So how would you optimize the use of that instruction?
      • Hand-optimization at the level the compiler does it could even hinder hand-optimization in the area where it is most effective and the compiler cannot do it at all: algorithmic optimization. To do that efficiently, you need highly structured code so you can exchange algorithms easily. However microoptimizations of the sort the compiler does them tend to destroy such structures.
      However, with the compilers getting more sophisticated in optimization, profilers get even more important: While you may be able to add some "profiling instructions" for your own use, profiler-driven optimization in the compiler cannot use such a replacement.
      --
      The Tao of math: The numbers you can count are not the real numbers.
    2. Re:Hell, yes it's useful by PissingInTheWind · · Score: 1
      I agree with you 100%

      I haven't been clear on that, my point was somewhat related to the class of language I use/like more (Lisp languages): it's hard sometimes to tell what's expensive in a Lisp program, and the compiler can only do as much with such a language. In those case a profiler is invaluable.

      --

      A message from the system administrator: 'I've upped my priority. Now up yours.'
  12. gprof far from useless by tps12 · · Score: 1, Informative

    First, a little background on gprof, for those new to the *n*x world. Gprof is what's known as a profiler. Basically, it inserts code into the beginnings and ends of functions. When you run your program through gprof, it then records how much time is spent in each part of your program. The idea is that the programmer sees where most of the time is being spent, and optimizes that part of the program.

    Now, as for the charges of gprof being useless, I can say that that is far from the case. True, it falls flat when dealing with multithreaded programs. But in practice, multithreaded programs are almost always interactive, and thus are primarily limited by user response times, which are many orders of magnitude longer than even the worst algorithm. In these cases, reducing the amount of input required from the user will always pay off better than any optimizations.

    As an example, in our enterprise database frontend, we had a dialog that would prompt users for an administrator password when they attempted the "delete" command. We did analysis (with a commercial profiler, but it may as well have been gprof) and found that, lo and behold, the bulk of execution time was being spent waiting for the user to type in the password. So what we did was change the delete command to "eteled" ("delete" backwards), and only told the administrators the new command name. This way, we could be certain that only administrators would even attempt a deletion, and no password prompt was necessary. We have since applied the same design philosophy throughout our software, and productivity is at an all-time high.

    As is usually the case, profiling can be the most important part of a project or next to useless. It all depends on how you use it. Gprof is a great tool for what it does; you just have to know how to use it properly.

    --

    Karma: Good (despite my invention of the Karma: sig)
    1. Re:gprof far from useless by TheDick · · Score: 1

      I wouldn't say "certain" But that is a cool idea. I'll keep that in mind :)

      --

    2. Re:gprof far from useless by cruelworld · · Score: 2

      Dear god thats got to be the worst security scheme I ever heard of.

    3. Re:gprof far from useless by Anonymous Coward · · Score: 0

      That's security by obscufaction. Analogy: we'll take the lock off the door and hide the door behind those trees. Once some bored user discovers your secret, your security is toast. Bad Idea.

    4. Re:gprof far from useless by stirfry714 · · Score: 1

      Normally I'd agree, but it depends on *why* you are asking for a password. The original post is a little unclear.

      If you are asking for a password in order to secure access to the delete functionality, then yeah, it's security by obscurity with all the *ahem* benefits and drawbacks that provides.

      If on the other hand, you are using a password as more a way to stop idiots from accidentally deleting something (but you aren't worried about securing access to that functionality), then providing a shortcut for the "elite" is worthwhile.

      Sort of like forcing confirmation on "rm" on a multi-user system, but those in the know can throw in a "-f" to bypass it...

    5. Re:gprof far from useless by Anonymous Coward · · Score: 0

      FUNNIEST TROLL IN MONTHS

    6. Re:gprof far from useless by tuxlove · · Score: 4, Insightful

      But in practice, multithreaded programs are almost always interactive, and thus are primarily limited by user response times,

      I would disagree with this wholeheartedly. What about databases like Oracle, MS SQL Server, and so on? They're internally multithreaded, and most definitely not "interactive" after you initiate a SQL query.

      I believe apache 2.0 is threaded. HTTP by nature is not interactive. And so on. There are many other examples, left as an exercise to the reader.

      While it is true that threads are very useful for interactive programs, in fact critical, their use does not stop there by a longshot. Any program which needs to do two things at once without fear of blocking on a system call is a candidate for threads. Threads are also useful for distributing compute cycles over multiple processors within a single process, allowing it to gain the benefit of concurrency.

      The project I'm currently working on is a custom database application, and without threads it would be useless. And there are no users talking to it directly, that's for sure.

      reducing the amount of input required from the user will always pay off better than any optimizations.

      I find this perplexing. Nobody cares about optimizing a user dialog. Reducing user input or optimization of user input code would serve little purpose in most multithreaded applications I'm aware of. Generally, interactive multithreaded programs use threads so they can interact with users while simultaneously performing some other task that shouldn't be stalled by waiting for user input. For example, a network monitor might have three threads: one for watching network traffic, one for resolving IP addresses to hostnames, and one for taking user input. It doesn't matter how long the user input thread sits around waiting for the user to type/click something. There are two other threads working away in the meantime, watching traffic and displaying it for the user, oblivious to whether or not the user is doing anything. In such a case as this, profiling the watcher/resolver threads might be very useful indeed, since they need to be more or less realtime.

      This gprof problem is a serious issue, and minimizing it by saying that threaded programs generally wouldn't benefit from profiling is naive.

    7. Re:gprof far from useless by be-fan · · Score: 2

      Nobody cares about optimizing a user dialog.
      >>>>>>>>>
      Wouldn't this sentence be fun taken out of context? Seriously, though, I think what the original poster was getting at was the fact that a lot of powerful interactive programs (3D modelers for example) can really make the user cry if they run computations and UI code in the same thread. In those cases, splitting off the calculation code to a seperate thread and giving it a lower priority than the UI thread ensures that the user-interface stays responsive, no matter what's going on in the background.

      --
      A deep unwavering belief is a sure sign you're missing something...
    8. Re:gprof far from useless by Anonymous Coward · · Score: 0

      Your comment, "... multithreaded programs are almost always interactive, and thus are primarily limited by user response times...", is simply incorrectly.

      The essential reasons for multi-threaded programming are:
      1. A simplified programming model for providing concurrent use of computation and IO resources, and
      2. Making optimal use of computation and IO resources.

      Consequently, the real value of multi-threaded programming is for systems and servers where computation and IO resources are much more expensive and where achieving the optimal balance is much more critical than on interactive applications running on ridiculously powerful client machines servicing one or few users doing minimal useful work.

    9. Re:gprof far from useless by Lozzer · · Score: 1

      Its marginally better than Windows 9x, although to be fair they claim nothing.

      --
      Special Relativity: The person in the other queue thinks yours is moving faster.
    10. Re:gprof far from useless by Anonymous Coward · · Score: 0

      They're just jumping right into the boat today, aren't they?

    11. Re:gprof far from useless by rocksh · · Score: 0

      Mmmm... how long did it take you to make "analysis with commertial profiler" that user input takes most of the time?! Isn't it "application response time" important paramer here, not overall time? If you please tell the name of you company so I can short the stock...

      --
      >
  13. Dead on linux? What about windows? by orz · · Score: 2

    I can't get any useful profiling information out of Microsoft Visual C++. When I compile in profiling mode, my program runs at less than 1% of normal speed, producing completely useless data. Am I doing something wrong? Should I be using 3rd party tools?

  14. Profiling will always be useful by Wesley+Everest · · Score: 5, Informative
    I work as a game developer, and we have to make sure that everything that is done for each frame takes less than 33ms. So we're always profiling our code to cram more functionality into a limited amount of time.

    But even if you aren't doing something that is speed intensive like games, you always have tradeoffs when you choose your data structures and algorithms. Generally you first code up the easiest algorithm that you think will use an acceptable amount of memory and CPU time. Then, later, if something is too slow, you have to identify where the problem is. If could be that you chose an O(N^2) algorithm not realizing that N might be 1,000 instead of the max of 100 you were counting on, forcing you to switch to an O(NlogN) algorithm that is more complex.

    Now, if it is a small application, you might have enough familiarity with the code to be able to guess where the problem is -- then you fix it and see if it is still slow. If that works, then you're set and profiling isn't necessary. But if the fix doesn't speed it up enough, then you're stuck. You have to profile it somehow.

    You might try simple tricks like changing the code to loop on a suspected bit of code 100 times and see how much longer it takes. Or maybe throw in some printf's that spit out the current time at different points. Or maybe create your own profiling code that you manually call in functions you want to time. Or, you might use an actual profiler without modifications to the code. But lacking a profiler doesn't mean you can't or won't profile your code.

    And even with CPU speed doubling every couple of years or so, that doesn't mean speed is no longer an issue. You can easily choose the wrong algorithm and have something take 1000s of times longer to run than the proper algorithm.

  15. I used gprof by Zo0ok · · Score: 3, Informative

    I used gprof quite much during my Master Thesis work this spring. gprof tells what functions consumes most cputime, and those functions could be optimised. Usually very small parts of the code consumes most of the cpu-time.

    This program was parallellised on network level - all clients were singlethreaded. If someone has multithreaded for performance (to utilize more than one cpu) I suppose gprof will still work well on a single cpu machine with just one thread.

    For programs that consumes lots of cpu time for well-defined computations it should not be hard to profile a single threaded version (a single threaded version is needed for debugging anyway).

    More complex applications (for example a web browser) I imagine are more dependant on multi-threading, and should pose a larger problem.

    gprof, is probably not dead - if you need it you can adapt the program...

  16. Programmers, not tools by sane? · · Score: 4, Insightful
    The problem is not that certain tools have issues; but rather that today's programmers have no interest in creating efficient code.

    Those of us that started programming in 1k and sub megahurts can really feel the time taken by badly coded applications. We know that forgetting what is happening on the silcon can kill how well our code will run.

    However, those who started coding after ~1987 don't really have a gut feeling for it. To them the latest processor will make up for their bad coding. To a certain extent they are right. Today's advances STILL keep up with Moore's law, still make up for their lack of skill. However, when one looks at what is actually performed with all that power, one tends to question why we are paying so much, for so little.

    Can you actually say that MS WordXP is much better than the non-WYSIWYG wordprocessor of yesteryear (itself a blast from the past) ?

    We don't need profilers, we need coders have have that tacit knowledge of what really counts, where they should put real effort.

    Unfortunately that doesn't come in a software box.

    1. Re:Programmers, not tools by Malc · · Score: 3, Insightful

      You talk such twaddle. Why waste your time trying to write efficient code from the start? It's much better to write easily unstandable, easily maintained, quickly written and minimal bug code. Unless you have a real need, such as with an embedded system. If the code doesn't perform well enough, then come back and optimise it later. It's a matter of where you want your efficiencies: memory and CPU utilisation, or development process. The latter is cheaper for a business, and so long as the product works on hardware that the users have, then the former is a waste of time and money.

      I used WordPerfect 5.0 (or whatever it was) on a dual 360K 5.25" floppy disk drive machine. Plain blue text screen only. I have to say, I *much* prefer Word XP. If given the choice, I would not go back to those crappy DOS days.

      By all means, be sentimental and reminisce about the old days. But things have changed - accept it.

    2. Re:Programmers, not tools by gid-foo · · Score: 1, Insightful

      "We don't need profilers, we need coders have have that tacit knowledge of what really counts, where they should put real effort. "
      This is a load of crap. If you're optimizing before even knowing where the problem is you can't be writing serious apps. The only programmers I know who do this are the same assholes who use globals because they're faster. Or manually unroll their loops in every case because the "know" it's going help. Or inline every method they write in C++. Or decide that they need to create all their member variables as real instances as opposed to pointers because the pointer traversal is too expensive.
      If you exhibit any of these symptoms you should quit programming. Please. You end up making more work for those of us who actually know what we're doing.
      gid-foo

    3. Re:Programmers, not tools by Tim+Browse · · Score: 2
      Can you actually say that MS WordXP is much better than the non-WYSIWYG wordprocessor of yesteryear

      Hell, yes. WYSIWYG is very useful.

      However, if you ask if WordXP is much better than Word for Windows 2.0, well, that's a much harder question to answer in the affirmative.

      For me, anyway.

      Tim

    4. Re:Programmers, not tools by Anonymous Coward · · Score: 0

      I definitely prefer LaTeX2e to TeX. It has changed the way I work. Also, Emacs is better than Ed for most things.

    5. Re:Programmers, not tools by sheldon · · Score: 2

      Can you actually say that MS WordXP is much better than the non-WYSIWYG wordprocessor of yesteryear (itself a blast from the past) ?

      Considering my experience dates back to Wordstar, I can answer yes to that question. Of course, if you can show me a better way to do tables in Wordstar 1.0 for CP/M besides using | and - characters, it'd be greatly appreciated.

    6. Re:Programmers, not tools by gorilla · · Score: 2

      1k? Luxury! I had 512 bytes, for both program and data, come home and Dad would beat us around the head and neck with a broken bottle, if we were LUCKY!

      </Monty Python>
    7. Re:Programmers, not tools by elindauer · · Score: 1

      We don't need profilers, we need coders have have that tacit knowledge of what really counts, where they should put real effort.

      The problem with this statement is that most of the time a programmer's gut feelings for where bottlenecks occur turn out to be way off. I've seen this happen so many times that I just take is as fact now: trying to optimize before profiling is a waste of time.

      There are two problems with optimizing too early. The first is, of course, optimizing code that isn't a bottleneck is a waste of time. Most software projects have very tight deadlines, and the time spent optimizing non-impact code is time that could have been spent developing another feature. The second problem is perhaps even worse: optimizing code usually makes it less readable, which can actually obscure the true problems and make implementing real optimizations more difficult in the future.

      The best approach is to start with a clean, intuitive design. After that, don't guess at the bottlenecks. Prove it to yourself ( there are many techniques, a profiler is one option ). You will be surprised how often you're guesses are wrong, and that doing the steps in this order can produce code that is efficient and maintainable.
      -Eric

    8. Re:Programmers, not tools by Broccolist · · Score: 1

      Bah. TeX dates back to the early eighties and I still prefer it to Word for most common tasks. (Yes, it does tables.) Admittedly, TeX was programmed by Knuth so the ordinary laws of nature probably don't apply to it :).

    9. Re:Programmers, not tools by Anonymous Coward · · Score: 0

      Sure, profilers are good. What we really need is programmers that actually &$#!%!@ using them in projects where speed counts.

    10. Re:Programmers, not tools by gillbates · · Score: 2
      Why not learn to write efficient code in the first place? Then you'll need neither a debugger nor a profiler.

      Not to be a troll, but I see a lot of programmers with this kind of attitude - "let the compiler catch mistakes", or "code it fast and use a debugger," etc... What invariably happens is that these programmers who learn to code this way spend their careers writing code which is neither efficient nor easy to maintain. Worse, they waste a lot of time using a debugger that they could have avoided had they thoroughly planned their code.

      I used to just blitz through the code, without really planning what I was going to do. While this worked well for small projects, when I got into the professional world, my debugging time went up by an order of magnitude. I've found that I actually get code done faster if I think it through and plan it out before I start writing. I've learned that if I want to fly through coding and debugging, I have to take some time first and plan what I'm going to do. Otherwise, if I scramble off to code writing without planning, I end up using the debugger quite a bit. But then again, YMMV.

      --
      The society for a thought-free internet welcomes you.
    11. Re:Programmers, not tools by Fjord · · Score: 1

      As funny as that is, I used to work on a system that has 128 bytes of RAM (yes bytes, including stack). However, it had 128K ROM for code and lookup tables in the initial spec and was changed to 256K when it was realized that 128K wasn't enough. This was in 1994.

      The processor didn't have mul or div either. I had to write those routines.

      --
      -no broken link
    12. Re:Programmers, not tools by Planesdragon · · Score: 2

      Can you actually say that MS WordXP is much better than the non-WYSIWYG wordprocessor of yesteryear (itself a blast from the past) ?

      Yes.

      I've had firsthand experience with two non-WYSIWYG word processors, Wordstar 2000 and Wordperfect 5.1 The first one was clunky, clutzy, and there was no way to tell what the darn thing was going to look like without printing it out. The second suffered from the "external blind manual" syndrome, to the point where it was necessary to memorize commands just in case the "secret cheat sheet" for the F* keys was missing.

      'course, I'd love it if the @!$@!#%ing thing acutally worked faster... but at least it gets em-dashes right.

    13. Re:Programmers, not tools by Planesdragon · · Score: 1

      There's a lot to be said for on-the-fly spellchecking. Having little red wavy lines is a much better interface for the "second set of eyes" that a spellchecker is.

      Is WordXP much better than 95, where they introduced the darn feature? It depends on what you want to do with it. It *still* can't be told to use an access database as the default for all merge documents, so it can't be THAT good. (And the HTML export is getting worse again...)

    14. Re:Programmers, not tools by sane? · · Score: 2
      quickly written and minimal bug code

      Thanks for making my point for me.

      Quality is designed IN, taking bugs OUT is an admission that you really didn't pay enough attention at the beginning. Sure you get the odd typo, but the real bugs are the ones in the logic of what your writing - and you often don't catch all of those.

      If you are thinking about what is actually happening, rather than just pasting in a bit of gash code, you are much more likely to create something with quality engineered IN. Trust me, its the only way its going to get there.

      As for the 'speed it up after the event' crowd - did you ever think that if you used the right approaches, the right concepts, from the start, you wouldn't have to spend the time tweaking some supposed critical element at the end? It should be second nature IF you really understand what you are doing. Sure there are always the games, device drivers etc., but I'm talking about the day to day code that gets executed every day by millions of people around the world. It generally takes no more effort to use the right technique as the wrong one - if you only knew the difference.

      Have a little pride in your work man! You might find that your 'good enough, lets stuff it out the door' mentality is why you don't go forward and your company goes to the wall as a result of a buggy product.

    15. Re:Programmers, not tools by sane? · · Score: 2
      I think this is getting to the heart of what I was saying.

      I seem to spend time everyday helping out someone who is trying to fight Word into doing what THEY want, rather than what it wants to do. This is not you or me, the people who can just pick it up and use it; its the non expert, the majority. They simply find today's Wordprocessors no great advance over those textmode wordprocessors of yesteryear.

      The reason? Back then they KNEW what it was doing, they could SEE the control codes, and delete them if they were wrong. Sure it couldn't tell you your grammer was wrong, but it never really fought against you either.

      If you look back to Word 2.0 and compare it against today you can see certain elements that you can think of as advances. But you can't really see much, and its certainly not an order of magnitude better. But we do have a whole load of attendant junk. Basically, we're going backward again.

      If we are going to go in the direction of a 'smart' wordprocessor then I want a truely smart one. Something that means I do less work and produce a much better result. I don't want something with a level of complexity that means I'm forever fighting it in doing the actual job - the one of transfering knowledge from my head to someone else's with the minimum of time and effort.

    16. Re:Programmers, not tools by Anonymous+Brave+Guy · · Score: 2
      If we are going to go in the direction of a 'smart' wordprocessor then I want a truely smart one. Something that means I do less work and produce a much better result.

      But Word is smart; it has IntelliSense Technology(TM)(R)(C). That's how it knows that when I type "6 July 2002" at the top of my letter, I really mean "6 July 2002-07-06". Come on, it's obvious... ;-)

      --
      If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
  17. Re:Dead on linux? What about windows? by Malc · · Score: 1

    NuMega DevPartner.

    Rational also used to produce something. I can't remember what it's called though. It was a companion to Purify.

  18. VTune by Anonymous Coward · · Score: 1, Interesting

    Use VTune (http://www.intel.com/software/products/vtune/vtun e60/), Intel's profiler.

    It does a pretty good job, and it uses performance counters.

  19. Here's how I profile my code.... by shayne321 · · Score: 3, Funny
    User: This program is slow
    Me: Really? Which part?
    User: When I click the "report" icon
    Me: Oh (tinkers with report code). Try it now.
    User: It's still slow
    Me: (shakes BOFH excuse 8-ball) Hrmm, must be interference from sunspots, try it again tommorrow

    :)

    --
    Today I didn't even have to use my AK; I got to say it was a good day -- Icecube
    1. Re:Here's how I profile my code.... by Anonymous Coward · · Score: 0

      Hilarious!

      You're fired.

  20. Not useless by pthisis · · Score: 5, Insightful

    Profiling in general certainly isn't useless. I'll usually write new code primarily in a high-level, high-productivity language (e.g. Python), and if it's too slow I'll profile it and rewrite applicable parts in C. Some projects require a lower level (C) approach from the start, though those are pretty rare. Without profiling you'll spend a lot of time optimizing code that isn't a bottleneck.

    Remember the words of Knuth: "Premature optimization is the root of all evil." Without profiling, you don't know what optimization is really needed and what isn't.

    That said...
    BEGIN RANT
    I've used gprof successfully with plenty of recent code. It works perfectly fine in non-threaded code, which _should_ be the majority (99%+) of code out there. Yes, that includes big network servers (the last one I wrote just recently passed the 6 billion requests served mark without blinking). Threads are a really nasty programming rathole that should be applied in a limited way; they take much of the time and effort spent developing protected memory OSes and toss it out the window. They also tend to encourage highly synchronized executions instead of decoupled execution, which often makes things both slower and more bug-prone (locking issues are _tough_ to get right when they become more than 1-level) and slower to implement than a well-designed multiprocess solution with an appropriate I/O paradigm. Just because two popular platforms (Windows and Java) make good non-threaded programming difficult doesn't mean you should cave in.
    END RANT

    --
    rage, rage against the dying of the light
    1. Re:Not useless by TWR · · Score: 2
      Just because two popular platforms (Windows and Java) make good non-threaded programming difficult doesn't mean you should cave in.

      WTF? How does Java make it hard to write non-threaded programs? If anything Java makes it easy to START writing threaded programs. When all the details start hitting you, you realize that it's trickier than it looks.

      -jon

      --

      Remember Amalek.

    2. Re:Not useless by dvdeug · · Score: 2

      It works perfectly fine in non-threaded code, which _should_ be the majority (99%+) of code out there.

      When I'm running a graphical program, the UI must not lock up, no matter what processing is going on in the background. I don't care how you solve that problem, but a simple use of threads is one of the simplest methods.

    3. Re:Not useless by pthisis · · Score: 2

      When I'm running a graphical program, the UI must not lock up, no matter what processing is going on in the background. I don't care how you solve that problem, but a simple use of threads is one of the simplest methods.

      Not really. It seems simple until you get into the details. Yes, for some things multi-threaded is the way to go. But a multi-process solution is usually easier to implement and more stable, and a straight asynchronous single-threaded state machine solution is often the best (in terms of ease of implementation and performance). Remember, the difference between threads and process is that processes have protected memory and threads have all shared memory. The number of cases where you really don't want most of your memory protected is very small, especially when you remember that processes can easily get shared memory segments for select pieces of memory. Most people choose threads because they think threads are better/faster/smaller than processes (which is true on some broken OSes but not meaningfully true on Linux) rather than based on whether or not they want most memory shared.

      Sumner

      --
      rage, rage against the dying of the light
    4. Re:Not useless by pthisis · · Score: 2

      WTF? How does Java make it hard to write non-threaded programs?

      No fork(). No multiplexed I/O. Try writing a good scalable network server in a single thread without the moral equivalent of select()

      Java 1.4 recognized that, and added I/O multiplexing. Still no good multiprocess (but not multithreaded) framework, though, and I/O multiplexing only solves a limited subset of cases.

      Sumner

      --
      rage, rage against the dying of the light
    5. Re:Not useless by be-fan · · Score: 2

      (which is true on some broken OSes but not meaningfully true on Linux)
      >>>>>>
      Not so. Have you ever compared the time between a thread-switch and a process-switch? The only difference on Linux is changing the MMU context. Yet, on x86, changing the MMU context is the slowest operation you can perform.

      --
      A deep unwavering belief is a sure sign you're missing something...
    6. Re:Not useless by pthisis · · Score: 2

      Not so. Have you ever compared the time between a thread-switch and a process-switch?

      Yep. And for most applications, it's not meaningful. If you spend all your time context switching, you're definitely not efficiently designed whether you use threads or processes--you can definitely measure the overhead in that case, but when you go to a situation where you're synchronizing on anything (mutexes, sockets, whatever) the difference essentially disappears. And even in the measurable situation, the difference isn't huge--about 2 usecs on my home machine on a total overhead of 4 usecs vs. 6 usecs (threads vs. procs). Sure, it's 33% SLOWER!!!! Horror!!! In the real world, it generally doesn't matter and it's small enough that if context switch overhead is hurting your multiprocess app then switching to multithreading won't really help.

      Both are so fast that if you though about your design at all they won't even be a blip on the radar, unlike on some OSes where switching process can take 100s of usecs vs. 10 usecs for a thread switch.

      There are exceptions, which is why I didn't say that threads are always bad. But the performance argument here is almost always specious, brought up by people who learned about threaded programming on other platforms where it is a huge win and used to defend a poor design choice (look, I can measure the difference in contrived situation X even though it has no effect on system performance).

      Sumner

      --
      rage, rage against the dying of the light
    7. Re:Not useless by Lozzer · · Score: 1

      I'd think you appreciate the quote on threads attributed to Alan Cox on Larry McVoy's page.

      --
      Special Relativity: The person in the other queue thinks yours is moving faster.
    8. Re:Not useless by pthisis · · Score: 3, Interesting

      I'd think you appreciate the quote on threads attributed to Alan Cox on Larry McVoy's page.

      (The quote in question is:
      "A computer is a state machine. Threads are for people who can't program state machines." Alan Cox)

      Except I'd assert that threads are far harder to program correctly than state machines. Easier concept at first, and easy to come up with a design for the 90% solution, but the devil's in the details and threads have a ton of details. Not to say that state machines don't, but they seem to cause less problems in practice.

      Sumner

      --
      rage, rage against the dying of the light
    9. Re:Not useless by Lozzer · · Score: 1

      I think I read that quote slightly differently to you. Subsitute the assertion in the first part of the quote into the second part and you get...

      Threads are for people who can't program computers.

      What it is not saying is "Threads enable people who cannot program computers to program computers"

      Basically I think Alan shares your point of view. That the way I was thinking when I remembered the quote. I certainly agree with you, and try to aviod them wherever possible.

      The only place I regularly use them (thus proving that I can't program, damn) is for Java user interfaces. Then its only if there is a moderately long running action, e.g. five seconds for a button. To stop the window becoming frozen with respect to the windowing system during this time a second thread appears to be necessary. I'd certainly be interested to hear of another way of doing it. In *nix style you would separate out the long running part into a separate process and get the second process to signal the first when it is done (effectively the completion becomes part of the same event loop that is processing the window events). I don't think Java gives the necessary infrastructure to do this, though I stand to be corrected.

      --
      Special Relativity: The person in the other queue thinks yours is moving faster.
    10. Re:Not useless by tc · · Score: 1
      I'd strongly disagree with your threading rant. Threads are most certainly a useful concept. A process is necessarily a more heavyweight object than a thread, and multiple threads can be implemented more efficiently than multiple processes. The fact that isn't true on some Unix and Unix-like implementations is a historical quirk, not an instrinsic problem.

      Frankly, the whole fork() methodology where you clone the entire state of a process seems pretty warped to me. Sure, it's sometimes a useful thing to do, but it makes a lot less sense if you have a threading implementation that works more efficiently than multiple processes (which should be the case if it's properly designed).

      If profiling threaded apps on Linux sucks, the proper response is not to say "but threads are lame anyway". The proper response is to fix threaded profiling, because threading is a perfectly reasonable design choice in many situations and it should be possible to profile those apps.

    11. Re:Not useless by Chainsaw · · Score: 2

      fork()-ing in Java would be an incredibly expensive operation. Mostly because you have to start an entire new VM. I'm not sure why you would prefer a new process instead of a simple thread. After all, separate processes have a hard time sharing data in an effective manner, making a global object cache overly difficult to implement.

      --
      War is one of the most horrible things a human can be exposed to. And one of the worlds largest industries.
    12. Re:Not useless by Sandmann · · Score: 1

      I generally agree with you, but there are cases where threads can improve performance, even with a single CPU.

      Suppose a server runs a poll()/select() loop and then gets hit by something that causes some rarely used code to be paged in (maybe an error handler). The server will block and do nothing useful until the disk i/o is finished.

      In cases like that, where there is a possibility of blocking behaviour, distributing the work across a few threads is an advantage, but don't make the mistake of associating one thread with one job.

    13. Re:Not useless by Dwonis · · Score: 2

      Can someone explain to me why other OSes take so long to do process context switches? I mean, what are they doing?

    14. Re:Not useless by pthisis · · Score: 2

      This is tough, all OSes are different. Here are a couple of examples:

      Many OSes save/restore all state when you switch contexts. Linux is lazy about saving state; it'll only save FPU state if the FPU is used by the new process. So for the 99% of process switches that aren't between FPU-using procs, you don't incur that overhead. Ditto TLB invalidation (e.g. when entering kernel mode but not switching processes); Linux is lazy about doing that. Both FPU state saving and TLB invalidation (especially) are heavyweight operations.

      OS internals also vary wildly; for instance, Linux tracks processes by keeping a pointer to the current task_struct. This pointer is updated when task switching. Threads are just processes which share VM (and a few other things) and are scheduled the same way as processes. NT (4.x; I'm not sure about 2000/XP) keeps context information on the kernel stack and requires a more heavyweight stack operation when context switching. Thread switching takes a completely different path through the scheduler as threads aren't really considered similar to processes at all.

      Sumner

      --
      rage, rage against the dying of the light
  21. There is no question that profiling is necessary by march · · Score: 1

    Profiling, in one form or another, is ABSOLUTELY necessary. There is no other way to find out why (and where!) your code is running slowly.

    Does gprof do everything we need? No. Are there better tools? Yes.

    But, the bottom line is that if you don't profile your code (and unit test it, and integration test it, and...), you are not writing good code.

    It's like debating if "breathing" is necessary or not.

  22. VTune and Quantify by Codex+The+Sloth · · Score: 4, Informative

    If you want tree profiling (i.e. information about function and child performence) then Rational Quantify is a reasonable alternative to the crap profiler that comes with MSDev.

    If you want a flat profiler or need to analyze the cost of specific low level operations then you MUST get Intel VTune.

    --
    I am not a number! I am a man! And don't you ... oh wait, I'm #93427. Ha ha! In your face #93428!
  23. Linux pthreads breaks lots of things by GGardner · · Score: 2

    It isn't just gprof that's broken by pthreads, other Linux tools fall victim as well. Core dump? Almost useless with pthreads running. Gdb? Getting better, but still a little wonky. Certain aspects of signal handling don't work as expected with pthreads.

    1. Re:Linux pthreads breaks lots of things by Anonymous Coward · · Score: 1, Informative
      Core dump? Almost useless with pthreads running
      I believe there is something in the -ac tree whereby multiple core files are dumped for threaded applications.

      Also, recently merged in 2.4.x is "next-generation threads". I'm not sure if glibc linuxthreads will utilize them, or if they're any better, but just letting everyone know the implementation might change.
  24. ACE has the answer by Ricdude · · Score: 3, Informative
    There is a simple profiling capability in the ACE toolkit, the ACE_Profile_Timer. Easy to wrap in a class with basic Start, Stop, and Elapsed methods. If you can guess what function or two the bulk of your program's time is being spent in, this can help pinpoint the worst offenders within that section of code. If not, create several timers, and time each function in your main loop, and print the information after the loop is finished. Drill down into subfunctions as needed. See where the milliseconds tick away. You might be surprised.

    And remember, in the immortal words of Michael Abrash, "Assume Nothing. Measure the improvements. If you don't measure, you're just guessing."

    --
    How's my programming? Call 1-800-DEV-NULL
  25. Profiling *IS* useful by exa · · Score: 1

    Drepper is smoking some strange shit. Every serious programmer uses profiling.

    Just because kernel and glibc wackos don't find it useful doesn't mean it isn't useful.

    I regularly use profiling for any code that demands performance.

    That was a very unfortunate remark of Ulrich.

    --
    --exa--
    1. Re:Profiling *IS* useful by Anonymous Coward · · Score: 0

      Note that he specifically referred to gprof on Linux. Which simply does not work for anything complex than hello world. He did not say that profiling in general is a bad thing (which would be stupid).

    2. Re:Profiling *IS* useful by Anonymous Coward · · Score: 1, Informative

      If Ulrich had written that profiling was useless, your response might make some sense. He didn't. He wrote that gprof was useless. See the difference?

    3. Re:Profiling *IS* useful by gkatsi · · Score: 1

      There's always the possibility that he meant that gprof in particular is useless. I know that for programs that are dependent on good cache behavior to get the performance they need, gprof totally messes up the results.

      So, maybe he said that profiling should not be done with gprof. It's still a weird statement, however.

  26. Threading may be the wrong model by tshoppa · · Score: 1
    Lots of debugging techniques don't work well with threaded programs. I think that blame here lies not with gprof, but with the threaded-programming paradigm or its current implementations.

    The problems that threading solves (multiple outstanding I/O's, multiple CPU utilization) can be solved using other methods. Those other methods have their evils, too, but trading off for the lesser net evil is what design and analysis is all about.

    Lack of profiling tools is pretty far down on the list of tradeoffs, in my opinion; much higher up are issues of maintainability and portability, areas where threading does badly anyway.

  27. HW Profiling is the best way to go... by Nathan+Mates · · Score: 1

    Out in the console games development market, there's one real serious tool: a hardware profiler. Basiclly, it's a heavily modified PSX with bus analyzers tacked on so that it can snoop and tell *exactly* where the slowdowns are. Is it a cache miss? Is it the GPU hammering on things? There's none of the "this function is slow" -- it points out *why*.

    You should not rely on profilers from the beginning of writing code, but you they're no cure-all either. A profiler can't tell you to use quicksort over a bubblesort. It just says what is slow, and it's up to the programmers to find a faster way to do things.

    The most recent x86 profiler I've used was Intel's VTune (AMD's free tool at http://www.amd.com/us-en/Processors/DevelopWithAMD /0,,30_2252_3604,00.html was so-so at best for my use). Those apps don't do any of the fancy bus analysis, etc. Still, I'd suspect they're better than nothing.

    I know this is going to sound like flamebait, but C++ *does* make it very easy to shoot yourself in the foot with regards to performance. If you don't set up all your operators to properly take consts, if you forget to set things up, it can kill performance. If you rely on a *lot* of small functions, you can either (1) blow out the cache with a larger executable (more likely on consoles), or (2) forget to inline a few, and kill your performance with lots of *tiny* calls that probably won't show up under VTune. The slowness of various compilers makes people afraid of putting a lot of small functions in headers where they belong, as any change would force a slow, full rebuild.

    I've seen C++ compilers decide to inline a 4x4 matrix copy by unrolling a loop to read/write the first 12 elements, then call the Vector4 copy constructor. Worst of all worlds. Replacing that with a memcpy was a huge win. But, the only way one would know *how* to fix that is to be able to look at the disassembly.

    Nathan Mates

    1. Re:HW Profiling is the best way to go... by Anonymous Coward · · Score: 1, Interesting

      The Intel tool probably is very decent if it's using performance counters. That's one part of a microprocessor design that turns out to be very useful for programmers, but many people don't even know they exist because they're buried in some special purpose register access somewhere. High end POWER boxes have had such things for years.

  28. Performance is not what really counts. by Tom7 · · Score: 2

    These days instruction-level efficiency simply isn't important outside of a few niche areas (embedded systems, games, multimedia, certain kinds of low-level systems work). To imply that knowing what's happening on the silicon is "what really counts" is nonsense. Using appropriate data structures and algorithms counts, and making correct software counts even more, but worrying about how many cycles one instruction takes versus another is a serious misdirection of effort on modern machines!

    It's folks like you who are the reason people still write their SSH daemons in C, and why we live in a mixed up world where we have neither stability NOR speed!

    1. Re:Performance is not what really counts. by Anonymous Coward · · Score: 0

      Its folks like you who are the reason we have Microsoft.

  29. CPU Intensive tasks by Anonymous Coward · · Score: 1, Informative
    When you have to deal with CPU intensive programs,
    you will find that a profiller is quite useful.

    For instance, in the last year I've been developing a automatic nesting server using Linux and gprof was very important to spot the functions that were consuming more cpu time.

    With gprof it was easy to notice two small functions that were responsible for 95% of the cpu usage.

    As a result, I replaced that two small functions with 180 lines of optimized assembly code and I got a very good performance increase, since I was using a lot of inter-word bit shifts that the C compiler didn't handle well.

    Regarding multi-threads, I come to the conclusion that 9 out of 10 times you don't really need to use threads, even in interactive programs, since there are alternative ways of acheiving the same efects.

    For instance, all the X11 toolkits like Xt/Motif, Gtk+ and Qt, have the concept of work-procedures and timeout-funcions.

    If you put all of your time-consuming operations inside work-procedures, you can get the same results as you would get with multi-threads, because you have an efective way of executing several taks at the same time without blocking the user interface.

    Fernando Pereira

    1. Re:CPU Intensive tasks by Anonymous Coward · · Score: 0

      Why do people use threads where a work procedure could be used? Is it just easier to program/keep state that way?

  30. Re:There is no question that profiling is necessar by pthisis · · Score: 5, Insightful

    But, the bottom line is that if you don't profile your code (and unit test it, and integration test it, and...), you are not writing good code.

    That's hardly true. Certainly you shouldn't waste time optimizing code until you know where the bottlenecks are. But it a lot of cases--I'd even venture to say most cases--code gets written and is fast enough. In such cases, profiling is a waste of time. Profiling is only indicated if there's a legitimate performance problem.

    To a lesser extent, the same is true of unit testing and integration testing. If you're writing some code to convert one image to a GIF and you run it successfully to get the GIF, there's no reason to unit test. Even if the code has horrible bugs on some inputs, the job is done. One-off code isn't (unfortunately) uncommon. Prototype code is also very common and often you don't need to do extensive testing on it, either. Any code where the total cost of code failure is lower than the cost of QA probably doesn't need to be QA'd (which is not to say that you should spend an amount on QA equal to the failure cost; if spending $1000 on QA reduces the chance of failure by 99.999% and spending $1000000 reduces the chance of failure by 99.9999%, the $1000 expenditure suffices in all but the most demanding applications)

    Sumner

    --
    rage, rage against the dying of the light
  31. Not useless, just different by Dasein · · Score: 2, Insightful

    There are very few application that don't reach out across a network for information. The bottleneck is usually this network communications. Check out Performant for tools that work on the network level.

    There's also a continuing trend of software developers spending user's computing power to make thier jobs easier. Java, J2EE, C#, .NET, C++, C all can theoretically produce software that is just as speedy as assembly but it rarely is. People still write assembly where performance really counts (games, realtime, etc.)

    Some people thinks that the wasted processing power is a crime. Me, I think it's just economics. It's much cheaper to pay for processing power than it is to pay for the developers to squeeze every last bit of performance out of an app.

    However, there are some applications where profiling is absolutely required. Database engines, games, simulations, anything that is CPU-bound has the potential of benifiting from profiling.

    --
    You are not a beautiful or unique snowflake -- but you could be if you got off your ass.
  32. Quantify! by ptomblin · · Score: 3, Insightful

    I've solved some important real-world problems using Quantify and Purify, especially when dealing with a huge system with a lot of developers fingers in the pie. One of the programs was handling 100,000+ transacations a day, and Quantify helped shaved enough off so we didn't have to force all of our customers to upgrade their hardware.

    Faced with a similar problem in Linux, I'd probably port the program to Solaris, Quantify it there, and hope the results are similar under Linux.

    --
    The next Cmdr Taco duplicate will be ready soon, but subscribers can beat the rush and see it early!
  33. Re:There is no question that profiling is necessar by Anonymous Coward · · Score: 0

    What if I run my software and it's is plenty fast? Is it still bad if I don't profile?

  34. So threads are evil -- now what? by johnfoobar · · Score: 2, Insightful
    Okay, so let's say threads are evil.

    But processes as provided by current operating systems are too expensive to use. If I have a network server (e.g. a httpd) that has to create a process for each network request, it will never scale. In theory all that has to happen is inetd (or equivalent) fork/execs and does the necessary plumbing so that the ends of the socket are STDIN and STDOUT. Then the process just reads and writes as necessary to fulfil the request. In practice, this just doesn't work.

    That's why you can't use cgi for high-volume transactions. So lets make the server a single multithreaded daemon process instead, where each request is handled by a thread. Now you can handle each request much faster, but you lose the protected address space the OS gives you in a process.

    Obviously, the OS needs to change, and give use something (maybe a hybrid between processes and threads) that more closely meets applications needs. I don't see anybody making suggestions as to ways to move forward. Anybody know of research in this area?

    1. Re:So threads are evil -- now what? by Anonymous Coward · · Score: 0

      I hope you're actually aware that there are other approaches than threads or multiple processes. See select(), etc.

    2. Re:So threads are evil -- now what? by pthisis · · Score: 5, Insightful

      Okay, so let's say threads are evil.

      Okay.

      But processes as provided by current operating systems are too expensive to use.

      No, they aren't. Have you measured fork() speeds under Linux vs. pthread_create() speeds()? Sure, Windows and Solaris blow at process creation (and Windows doesn't have a reasonable fork() alternative--it conflates fork() and exec() into CreateProcess*()), but that doesn't make all OSes brain-dead.

      If I have a network server (e.g. a httpd) that has to create a process for each network request, it will never scale.

      Right. And if you create a new thread for each network request, you'll never scale--give it a try some time. Good servers that use a thread/process for every connection do so with pre-fork()'d/pre-pthread_create()'d/whatever pools. Apache, for instance, uses multiple processes (but no multithreading, except in some builds of 2.x) but pre-forks a pool of them. This is really basic stuff, even an introductory threading book will talk about pooling and other server designs.

      Really scalable/fast implementations don't even do that. They use just one process (or one per CPU) and multiplex the I/O with something like select, poll, queued realtime signals (Linux), I/O completion ports (NT), /dev/poll (Solaris), /dev/epoll, signal-per-fd, kqueues (FreeBSD), etc. (select and poll don't scale well to 10s of thousands of connections when most are idle, but some of the others are highly scalable). See e.g. Dan Kegel's c10k page for specifics.

      Obviously, the OS needs to change, and give use something (maybe a hybrid between processes and threads) that more closely meets applications needs

      http://www-124.ibm.com/pthreads/ proposes an M:N threading model and offers an implementation, but it still has the shared memory problems of threads. multiprocessing may not be sexy but it's really a lot cleaner for most problems and can be more efficient in a lot of domains.

      Sumner

      --
      rage, rage against the dying of the light
    3. Re:So threads are evil -- now what? by johnfoobar · · Score: 1
      Yes I am aware of select(). select() doesn't really give you concurrency, it just lets you avoid blocking (unless I'm mistaken, which is always possible...). Thanks for the suggestion anyway.

      I don't know whether select() would scale; apparently it scales linearly, which is fine when n=20 but not when n=10000. I think if you are going to go with select() you need an implementation that scales logarithmically.

      Anyway, to answer my own initial question, there is a paper which I found with a google query and seems interesting.

    4. Re:So threads are evil -- now what? by johnfoobar · · Score: 1
      I don't claim to know all about his topic; that's why I'm asking! :)

      That said, thanks for the information, it has certainly helped to clear some things up.

    5. Re:So threads are evil -- now what? by Anonymous Coward · · Score: 0
      Maybe select() and poll() scale as poorly(?) as linearly, but what makes you think that threads don't? An event still has to be delivered to the right thread. The OS probably does it in linear time, but even threads are heavier than a struct pollfd[]. If it's the connecting up that's slowing your program down, that is a large subset of your threads is being called on, the the paging is going to be worse than the poll() call.

      But don't take my word for it, but look at something like the Tux server, and try comparing it the the fastest threaded solution you can find.

      (incidentally, the memory protection is a bit of a red herring. A state machine is one process with access to all of the data streams, and doesn't seem inherently superior to several processes with access to all the data, at least in this context. It makes a difference when a leak in the UI code (very common) can scribble on actual important data).

    6. Re:So threads are evil -- now what? by pthisis · · Score: 3, Interesting

      That said, thanks for the information, it has certainly helped to clear some things up.
      No problem.

      I guess the key point I want people to remember (if I only clear one thing up...) is that a decision about whether to use threads or processes should be based on whether they want all (or mostly) shared memory, in which case threads are in order, or some protected memory (and possibly some shared) in which case processes are the way to go.

      Windows has hoodwinked people into thinking threads are fast and processes are slow (and that processes have to start new executables), when that's really not the interesting detail and isn't really very true under well-designed operating systems. And you lose a lot by giving up protected memory (even only giving it up wrt other threads in your memory space).

      Sumner

      --
      rage, rage against the dying of the light
    7. Re:So threads are evil -- now what? by be-fan · · Score: 2

      fork() alternative--it conflates fork() and exec() into CreateProcess*())
      >>>>
      Umm, fork() is the one that's braindead. Who the hell dreamed up a system where creating a new process would copy the entire state of an existing one only to have it wiped out when the other process did an exec()? fork() requires all sorts of nasty stuff (like copy-on write in the VM) that is ditched if the OS follows a process/thread model. Windows might be braindead, but CreateProcess() makes a hell of a lot more sense than fork().

      --
      A deep unwavering belief is a sure sign you're missing something...
    8. Re:So threads are evil -- now what? by pthisis · · Score: 3, Insightful

      Umm, fork() is the one that's braindead. Who the hell dreamed up a system where creating a new process would copy the entire state of an existing one only to have it wiped out when the other process did an exec()? fork() requires all sorts of nasty stuff (like copy-on write in the VM) that is ditched if the OS follows a process/thread model.

      Uh, COW isn't ditched in a process/thread model. Shared libraries would suck without it. Demand paging of executables wouldn't work with it. It's a fundamentally good thing used by Unix, MacOS X, Windows, and almost all other modern OSes which support protected memory. Definitely not "nasty stuff", and by itself it eliminates 99% of the fork() overhead vs. threads.

      You really want to be able to create a new process with the same state as the existing one, and fork/exec allows that. There's system() if you want an entirely new executable (which might call fork()/exec() or might call spawn(), vfork()/exec(), or whatever...). I don't feel like arguing over whether a spawn()/CreateProcess*()-style syscall is good, but not having a fork()-style syscall is simply braindead. There are things you can do with fork()/exec() that you can't do with spawn() or CreateProcess*(); the reverse isn't true.

      Sumner

      --
      rage, rage against the dying of the light
    9. Re:So threads are evil -- now what? by roca · · Score: 2

      fork() gives you a chance to do all sorts of useful things in the child process before you exec() and start the new code running that you have no control over. You can redirect files, set resource limits, acquire or drop capabilities, open or close pipes, etc etc. Since you can't do these things directly inside the child process in Windows, Windows gives you a way to do some of them by passing parameters to CreateProcess, others using special purpose APIs such as DuplicateHandle and the Job APIs, and others you can't do at all without a world of pain. This is one reason why the Win32 APIs are so complex and bloated.

      So at the expense of a little complexity in the VM system for copy-on-write, you get a much simpler, cleaner interface for the programmer building stuff on top of the OS. (Not to mention the fact that you can fork() without exec'ing, and in many situations that turns out to be a much more convenient and safe approach to concurrency than using threads.) Sounds like a good tradeoff to me. We're not talking about much complexity either; students routinely do this stuff in OS course project assignments.

      I think understanding why fork()/exec() is better than CreateProcess() is an excellent lesson on how to design a good interface.

    10. Re:So threads are evil -- now what? by dmelomed · · Score: 1

      If fork() expense is not an issue, then process context switch and process state memory requirements are definitely issues. Create a few ten thousand processes on your machine, and you'll bring your machine down to its knees. Your memory will be wasted, your CPU will spend most of the time (if not all) context switching.

      Of course the solution is a real language with built-in concurrency, message passing, distribution and fault tolerance like Erlang.

      Why Erlang

      Not some perversion like pthreads under Java or pthreads in C. Erlang processes have very lightweight message passing and process management overhead. Lighter than your OS (because all Erlang processes run inside the Erlang VM), several orders of magnitude lighter than Java, and no mutexes, semaphores, and other such bullshit to worry about. In addition you get the ability to distribute your processes seamlessly over networked machines. This is a language built-in feature.

      The only library in C which lets you do something similar is state threads from SGI.

      State Threads

    11. Re:So threads are evil -- now what? by dmelomed · · Score: 1

      Many, if not most select() and poll() implementations scale very poorly for an order of a few thousand file descriptors.

    12. Re:So threads are evil -- now what? by Anonymous Coward · · Score: 0

      Excellent post, and almost 100% factually correct, but you missed one point in a thread/state machine design's favor.

      One thread per CPU event driven state machine servers will always beat One process per CPU event driven state machine servers.

      The simple reason is that the OS can optimize context switches to avoid switching page tables, and the resulting cache and TLB flushes.

      Sure, its not likely to be more than a 5-10% speed up on linux, but when you're groping for those last few TPS, it matters.

      And on OSes like Solaris where the difference between a thread switch and a full context switch can be 500%+, it matters a lot more.

    13. Re:So threads are evil -- now what? by pthisis · · Score: 2

      One thread per CPU event driven state machine servers will always beat One process per CPU event driven state machine servers.

      The simple reason is that the OS can optimize context switches to avoid switching page tables, and the resulting cache and TLB flushes.

      Sure, its not likely to be more than a 5-10% speed up on linux, but when you're groping for those last few TPS, it matters.


      Can you name a single real-world application where using threads instead of processes on Linux speeds it up even 1%, let alone 5%?

      Certainly if it does exist it's not an efficient application.

      (Sure, other OSes can't context switch to save their lives and force you to use incorrect abstractions because of it--I don't much care)

      I'm not saying to never use threads, but the decision to use threads vs. processes should be based on whether you want/need your memory to be shared (with all the problems that introduces as well as the convenience) or not, not on any perceived performance problems. Or, as Alan puts it, "threads are processes that share more". That's the way to think of them. And good modular programming remembers to share only what is absolutely necessary--keep your data hidden when possible.

      Sumner

      --
      rage, rage against the dying of the light
    14. Re:So threads are evil -- now what? by Anonymous Coward · · Score: 0
      You can redirect files, set resource limits, acquire or drop capabilities, open or close pipes, etc etc.

      There is no reason why this cannot be implemented and made a POSIX standard for spawn() or newproc() or whatever is a good name for creating a new process.

      I think understanding why fork()/exec() is better than CreateProcess() is an excellent lesson on how to design a good interface.

      fork()/exec() is invaluable. COW is a very good optimization for a VM system. However, this does not eliminate the need for spawn() or vfork()/exec() or CreateProcess(). There ARE several instances where you don't care about the parents address space and you need to run a NEW, FRESH, process. No matter how optimized fork()/exec() gets, it will always be cheaper to create a process from scratch rather than replicating/copying structures etc. only to replace them immediately.

      Then, there are problems such as memory overcommit. If you have a system enforcing non-memory overcommit and you use the fork()/exec() model, you may end up in a situaton where that huge daemon cannot run much smaller tasks in other processes because doing so would require that the system provide the resources to support a replica of the huge daemon's address space. Why? Because what the child does after fork() cannot be guaranteed. The child may well continue and overwrite all of the COW pages it got from its parent and the system MUST guarantee to provide resources to accomodate this to or deny the fork() request to prevent overcommit.

      Current systems don't care about this problem and make no effort to prevent overcommit (Linux, FreeBSD, or NetBSD): you can crash all of these by simply having a process allocate a big block of memory, fork() a number of times, then have each child try to overwrite the big block of memory it allocated. It's a neat way to kill gettys and even init which I have done several times myself just to remind myself of these systems' shortcomings. The only way to explicitly tell the system that a new process is needed to create the system call to do so. fork() is ambiguous in its intent.

      In summary, I think vfork(), spawn(), or something similar should COMPLIMENT fork()/exec() rather than replace it, which is all simple-minded people seem able to comprehend or conceive. Some problems (like memory overcommit) cannot be effectively tackled without this facility.

    15. Re:So threads are evil -- now what? by Anonymous Coward · · Score: 0
      No, they aren't. Have you measured fork() speeds under Linux vs. pthread_create() speeds()?

      In Linux, threads are processes. There is no difference, except "threads" share more of their address space than normal. FreeBSD has rfork() to do this and IRIX has something similar. Since fork() would only write this pages COW as an additional step, it is not surprising that on Linux pthread_create() is not much faster than fork(). This hack will prevent and has up to this time preventing Linux from having POSIX compliant pthread support.

      However, in OSes that implement real threads, there will be a big difference:

      Sure, Windows and Solaris blow at process creation (and Windows doesn't have a reasonable fork() alternative--it conflates fork() and exec() into CreateProcess*()), but that doesn't make all OSes brain-dead.
  35. Plenty of options for Java by grungeman · · Score: 2, Informative

    For Java we have a really nice choice of profilers. There are basically three great products available, all of them have proved to be absolutely useful. There is JProbe, OptimizeIt and JProfiler (the 2.0 beta of JProfiler looks cool). I don't know what the problems on Linux are, but when programming Java, profiling is quite an enjoyable task.

    --

    Signature deleted by lameness filter.
  36. I use it on threaded code ... by taniwha · · Score: 2

    mind you I have my own threads package - you need to if you want 1,000,000+ really small threads running together, with totally minimal stack space (4 bytes not the 1Mb that pthreads gives you). The only hard part was making gprof use SIGALTSTACK (which was broken in the kernel when I started).
    Of course this worked because from gprof's point of view I was running in one kernel thread - apart from that oprofile rocks :-)

  37. Every terrorist was a profiler . . by Gatesninny.net · · Score: 1

    . . but not ever profiler is a terrorist?

  38. [YHBT] Re:gprof far from useless by ethereal · · Score: 1

    Something is very confusing here - if it's already multithreaded, then the whole program isn't blocked waiting for input. And the thread that's waiting for input should be using a blocking interrupt-driven call to get the password, rather than polling for whether the user typed it or not. There's no reason for a process or thread that's blocked waiting for input to use any CPU time, so there should be no "execution time" impact. Reducing time waiting for user input would improve the overall time to complete the job, but it's not really applicable to questions of processor loading for a computationally intensive task, which is what profiling's really for.

    I think this is a clever troll, and some moderator fell for it. Looking at your diary pretty much proves this point. But you had me going there for a minute, so good job in that respect.

    Oh yeah, while I'm thinking about it:

    3000th post! Yay me!

    --

    Your right to not believe: Americans United for Separation of Church and

  39. write event driven programs; threads for CPU work by Splork · · Score: 4, Informative

    minimize the use of threads whenever possible. write your code in an event driven fashion as your friendly AC suggested. the poll() system call [superior to select(), though select() works well within its fixed size filedescriptor array limits] makes this possible.

    the basic mentality to switch from threads to event programming is this: anytime you're using a thread solely so that it can sit around and block on high latency events (network or disk I/O) most of its lifetime, it should not be a thread.

    its acceptable to have worker threads/processes that you hand computational tasks to and they trigger an event in your event loop when they hand a result back, but don't use threads of execution to manage your state. you'll pull your hair out and still have a nonfunctional program.

  40. working code, not pipe dreams by Splork · · Score: 3, Interesting

    i'll always choose a program that exists and works with a good user interface over one that is never released because the author(s) thought it could be faster.

    listen to your profiler. everything else lies.

  41. Re:There is no question that profiling is necessar by march · · Score: 2, Insightful

    Ok, you got me. Now, let's apply a common sense filter to my original post.

    Of course "one off, disposable code" doesn't need the same degree of "analness" applied to it as does mission critical code.

    However, "fast enough" is a really bad metric to use. Yes, utility "X" is fast enough. But oh, I didn't realize it was going to be used in conjunction with utility "Y" and "Z". Now, everything is really slow. Hey, can you say Microsoft?

    Fortune telling is not part of any programming job description I've ever seen.

  42. Re:There is no question that profiling is necessar by pthisis · · Score: 3, Interesting

    However, "fast enough" is a really bad metric to use. Yes, utility "X" is fast enough. But oh, I didn't realize it was going to be used in conjunction with utility "Y" and "Z". Now, everything is really slow. Hey, can you say Microsoft?

    Hey, I need this report on my desk every morning. It takes 3 hours to run. Let's kick it off every night at midnight.

    Fast enough, even though a well-coded, well-designed implementation might take seconds to run. And mission critical. No point wasting programmer time speeding it up when we can do another project with big upside instead.

    This sort of thing is not uncommon at all.

    Sumner

    --
    rage, rage against the dying of the light
  43. Re:There is no question that profiling is necessar by Nyarly · · Score: 2
    If you're writing some code to convert one image to a GIF and you run it successfully to get the GIF, there's no reason to unit test. Even if the code has horrible bugs on some inputs, the job is done.

    Um, but...I think there's a confusion of context occurring. The situation you describe happens when you're writing little chunks of one-off code to perform one task and be done with. Usually it'll be used once, or is part of a stopgap "until there's a real solution." If you're producing a product - if an entity external to your workplace is paying money for what you're producing, then you code isn't good without testing; and if you've got some spare cycles going on, profiling isn't too bad either. Something for a Malicious Coder to do when he's bored of adding bugs.

    I'd even argue you have the same moral obligation to produce the same level of quality (in terms of well tested and possibly profiled code) if entities outside your workplace will use your software. Just because it was free doesn't mean it should suck.

    --
    IP is just rude.
    Is there any torture so subl
  44. Oh no... by be-fan · · Score: 2

    First, the idea was to write in ASM to squeeze every drop of performance from the hardware.
    Then, the idea was to write in a high-level language, but always be careful about performance.
    Then, the idea was to develop apps quickly, then profile to optimize the important parts.
    Now, screw optimization, let the user buy more hardware!
    I think this attitude sucks. Even my 1.5Ghz Athlon-XP is slower running KDE 3.x (or any version of gnome for that matter) than my old 300Mhz PII was running Win98. And it doesn't do a hell of a lot of stuff that my old machine couldn't. I switched to Linux and took the performance hit because I hated Microsoft. I keep upgrading KDE (and my hardware) because the latest apps only work on the latest version. I don't expect more complex software to get faster, but I'd expect that as I upgrade my hardware, software should stay relatively the same speed. Yet, it seems as if software is getting slower more quickly than system bottlenecks (specifically RAM and hard-drive speed) can keep up. That means that the end-user experience is deteriorating, even as users pump more money into their hardware to get usable performance.

    --
    A deep unwavering belief is a sure sign you're missing something...
    1. Re:Oh no... by GigsVT · · Score: 1

      First, the idea was to write in ASM to squeeze every drop of performance from the hardware.
      Then, the idea was to write in a high-level language, but always be careful about performance.
      Then, the idea was to develop apps quickly, then profile to optimize the important parts.
      Now, screw optimization, let the user buy more hardware!


      First, the hardware has very very expensive.
      Then, the hardware was less expensive, but still neck and neck with labor.
      Then, hardware became an order of magnitude cheaper than the labor.

      What you're seeing is just the result of the ratio between programmer time cost and hardware cost. People and companies are just doing things in the most cost efficient way possible.

      I should call it Gigs's law:

      Code bloat = labor cost / hardware cost.

      Think I'll be as famous as Moore? I'm sure someone has proposed this before.

      --
      I've had enough abrasive sigs. Kittens are cute and fuzzy.
  45. Good point there. by AltGrendel · · Score: 2
    gprof, is probably not dead - if you need it you can adapt the program...

    Isn't that what Open Source is all about?

    --
    The simple truth is that interstellar distances will not fit into the human imagination

    - Douglas Adams

  46. Re:There is no question that profiling is necessar by pthisis · · Score: 3

    Um, but...I think there's a confusion of context occurring. The situation you describe happens when you're writing little chunks of one-off code to perform one task and be done with. Usually it'll be used once, or is part of a stopgap "until there's a real solution."

    With testing, that's generally right. If something's going to run often, it can potentially fail a lot of times and so even a small cost of failure will be compounded to the point where QA is worthwile.

    With performance, that's often not true. There are a lot of jobs that don't need anything approaching "good" performance (batch reports--I need a web usage report every morning on my desk/in my inbox--where the quick-and-dirty multipass solution that takes 3 hours to run can be scheduled at midnight, and the programmer can then do another project with big ROI instead of spending time writing a faster solution that takes only seconds to run) are one extremely common example of this (as is other batch processing). Many applications fall into that domain, many of them absolutely mission critical and responsible for millions in revenue but also not worth spending time optimizing when it could be better spent testing, adding features, or working on another project entirely.

    And many (I'd say most) interactive application are fast enough from the get-go and never need optimization. Sure, there are some apps that either do a lot of computation (mp3 players, games, compilers, etc), or are run many times at once (web servers), or are too slow when first run for unknown reasons. But a lot of programs are fine from the start and profiling them is a waste.

    Sumner

    --
    rage, rage against the dying of the light
  47. OProfile + Prospect by irix · · Score: 4, Informative

    And for getting even more useful information out, try Prospect. It works with OProfile - there was a talk about it at this year's Ottawa Linux Symposium, which you can find in the conference proceedings (gzipped PDF).

    --

    Do you even know anything about perl? -- AC Replying to Tom Christiansen post.
  48. Re:There is no question that profiling is necessar by be-fan · · Score: 2

    But what happens whe the program files overnight, and the poor user comes in in the morning to find that he doesn't have enough time to run the program again before the deadline? I bet at that piont, he'd appreciate the well-coded, well-designed version...

    --
    A deep unwavering belief is a sure sign you're missing something...
  49. Re:There is no question that profiling is necessar by pthisis · · Score: 2

    But what happens whe the program files overnight, and the poor user comes in in the morning to find that he doesn't have enough time to run the program again before the deadline?

    Then you profile and optimize, because it's not "fast enough" any more.

    Is that hard to understand?

    Sumner

    --
    rage, rage against the dying of the light
  50. Re:There is no question that profiling is necessar by Nyarly · · Score: 2
    But a lot of programs are fine from the start and profiling them is a waste.

    I agree that profiling isn't always necessary, and that sometimes profiling and optimization won't reap any advantage, but I think the range between not necessary and useless is wide, and the advantage from profiling in that range is subtle but existant.

    Additionally, profiling can serve other purposes. It's been suggested that, under a unit testing regime, a coder new to a project can serve as a "Malicious Coder," whose job it is to add bugs to code to catch out situations the unit tests miss. The advantage is that this can improve the testing as well as bringing up a new team member quickly. Profiling/optimization tasks can serve a similar purpose. By giving a direction to code investigation, it speeds the acquisition of familiarity with the code.

    --
    IP is just rude.
    Is there any torture so subl
  51. Re:There is no question that profiling is necessar by RollingThunder · · Score: 2

    If you don't take a cursory run with a profiler on it, you'll never know the real cost of speeding it up.

    It's worth a quick overview of the profile, to determine how long it would take to optimize said report.

    I talk from painful experience - a job I once worked at ran overnight DB jobs on their Oracle database. Nobody bothered checking for efficacy of their SQL until the jobs that had accrued grew to take more than 8 hours in total, and were still running when users came in for the morning.

    Then, with a scant four days of programmer, the jobs got pared back to three hours, AND some bugs got fixed. If they'd done that a few months earlier, we would have avoided 4 months of pain and anguish from users coming in, trying to use the system, and screaming bloody murder because it still wasn't available for them at 7:30 AM.

  52. I always find it funny... by be-fan · · Score: 2, Insightful

    How *NIX grognards always complain about multi-threading, but don't find signals (and their nasty interrupt-driven nature) to be the least bit unsettling!

    --
    A deep unwavering belief is a sure sign you're missing something...
    1. Re:I always find it funny... by ++good-duckspeak · · Score: 1
      How *NIX grognards always complain about multi-threading, but don't find signals (and their nasty interrupt-driven nature) to be the least bit unsettling!
      [TMBAT]

      Signals and their associated damage ( EINTR ) are one of the blights of Unix, I don't know of many new programs that use signals for anything other than a primitive event or control mechanism.

      --
      Why is Triangle Man so MEAN?
  53. what's the problem? by g4dget · · Score: 4, Interesting
    You say that there is a problem with profiling multithreaded code with gprof. But the issue you point to seems to apply to both single and multithreaded code: Linux gprof doesn't seem to count time spent in system code.

    Now, compute intensive code tends not to spend a lot of time in system calls, so it isn't clear that it matters whether a profiler counts time spent in system calls. I kind of prefer if it doesn't because it doesn't clutter up the profile with I/O delays (which are usually unavoidable).

    If you want to find out where your code is spending time in system calls, you can use "strace -c".

    There are also gcov-like tools that can be used for profiling via code insertion (as opposed to statistical profiling like gprof), although I'm not sure whether PC hardware has the necessary timer support.

    Overall, the answer is: yes, profiling still matters for programs that push the limits of the machine. But fewer programs do. I think most people would be a lot better off not programming in C or C++ at all and not worrying about performance. Too much worry about "efficiency" often results in code that is not only buggy but also quite inefficient: tricks that are fine for optimizing a few inner loops wreak havoc with performance when applied throughout a program. Too much tuning of low-level stuff also causes people to miss opportunities for better data structure and program logic. This is actually an endemic problem in the industry that affects almost all big C/C++ software systems. Desktop software, major servers, and even major parts of the kernel should simply not be written in C/C++ anymore.

    The thing with profiling and optimization is to know when to stop, and few people know that. So, maybe the best thing to say is: "no, profiling doesn't matter anymore". That will keep most people out of trouble, and the few that still need to profile will figure it out themselves.

    1. Re:what's the problem? by kmo · · Score: 1

      You say that there is a problem with profiling multithreaded code with gprof. But the issue you point to seems to apply to both single and multithreaded code: Linux gprof doesn't seem to count time spent in system code.

      I've completely given up trying to use gprof under Linux. When I need to profile our code, I still use gprof, but under Solaris. I can then fix the bottlenecks, which are rarely system specific, and the Linux version runs faster as well.

      Now, compute intensive code tends not to spend a lot of time in system calls, so it isn't clear that it matters whether a profiler counts time spent in system calls. I kind of prefer if it doesn't because it doesn't clutter up the profile with I/O delays (which are usually unavoidable).

      Maybe you've never used a working gprof. In a working profiler, time spent waiting for I/O doesn't show up because it doesn't take CPU cycles to wait. If the app is waiting for I/O, it gives the CPU to another process. Time spent in system calls can take a significant amount of of time in a multithreaded app. Thread synchronization is expensive. I had a multithreaded server app that spent 15% of it's time just in the posix mutex functions.

    2. Re:what's the problem? by gkatsi · · Score: 1

      I think you've got it backwards. gprof uses code instrumentation, not sampling. Other profilers, such as jprof and (I think) oprofile use sampling.

      (that's why you need to compile your code -pg, not -g, when you want to generate profiling info: the compiler has to instrument the code)

    3. Re:what's the problem? by g4dget · · Score: 2
      Maybe you've never used a working gprof.

      I used to use gprof on Suns, which by your definition is "working".

      In a working profiler, time spent waiting for I/O doesn't show up because it doesn't take CPU cycles to wait.

      Oh? Could have fooled me. I have had plenty of CPU intensive I/O. But in any case, when I do look at system calls, of course, I want to know time waiting. It is of no interest to me whether the process is slow because the CPU is spinning or because it's waiting for a disk block.

      Thread synchronization is expensive. I had a multithreaded server app that spent 15% of it's time just in the posix mutex functions.

      I don't get it: do you or don't you want to see time spent waiting? Waiting for a mutex may well be just--waiting.

    4. Re:what's the problem? by g4dget · · Score: 2

      Come on: read and think. gprof wasn't working multithreaded because it was getting its signal only in the main thread. And read the documentation.

  54. Re:There is no question that profiling is necessar by pthisis · · Score: 3, Interesting

    If you don't take a cursory run with a profiler on it, you'll never know the real cost of speeding it up.

    Right. It's obviously a cost/benefit tradeoff. If you start the report at midnight and need it at 8:00 in the morning, then if it takes 15 minutes to run you probably don't even want to think about profiling. If it takes 7 hours, it's still fast enough for now but you may want to concern yourself with whether it'll always be fast enough. What's the cutoff? 1 hour? 4 hours? Depends on how crucial the report is and what other projects are on your plate at the moment.

    Obviously "performance problem" is tough to quantify in general, but I still contend that you should normally only profile if there is a potential performance problem (or if you have idle resources, etc). Otherwise, go do some QA. Work on a new project. Clean up the nasty hack you wrote late at night to get it going. Write some documentation. Whatever.

    Sumner

    --
    rage, rage against the dying of the light
  55. It is not profiling useless... it's multithreading by WetCat · · Score: 1

    _Native_ _x86_ multithreading is useless and harmful.
    1) It heavily decreases number of processes - a very tight resource in Linux
    2) It makes programs cumbersome and hard to debug
    3) In x86 architecture in Linux it was not a good idea to make threads implemenation via context switches for thread switches - in x86 it's a very costly operation.
    But it's only rant. Same as MIME this flawed technology is already used a lot and it's no way to turn it back.
    (why mime? mime is a stupid thing - a dirty hack created from not wanting to rewrite old 7-bit protocol from scratch).

  56. Better yet: Optimization from profiler feedback by yerricde · · Score: 2

    What could be more useful is if the compiler implementor would spend as much time on the profiler than on the compiler: you would then be able to easily see faulty parts in your software and be able to determine what needs to be optimized.

    Better yet, if an architecture has a static branch predictor that encodes "mostly taken" or "mostly not taken", the compiler could emit profile code that measures how fast a particular variant runs and then take that into account for the next optimization pass.

    --
    Will I retire or break 10K?
    1. Re:Better yet: Optimization from profiler feedback by alecthomas · · Score: 1
      Better yet, if an architecture has a static branch predictor that encodes "mostly taken" or "mostly not taken", the compiler could emit profile code that measures how fast a particular variant runs and then take that into account for the next optimization pass.
      You mean like -fprofile-arcs and -fbranch-probabilities in gcc 3.x?
  57. inlining makes profiling c++ code difficult by PuntaConejo · · Score: 1

    I program a lot in c++, and I particularly like to use the STL. Thus, my programs often have a lot of inlined functions in them. I have found gprof to be much less useful when profiling such programs.
    When a function is inlined, gprof does not account for that functions time. Nor should it be exepcted to, since optimizations may reorder the code so much that it is not feasable to attribute a particular assembly instruction to a particular function. I have tried recompiling my programs with -fno-inline to expose the names of the inlined functions, but this changes the program performance so much in some cases that I am hesititant to draw any conclusions about a program from such a profile. Short of abandoning inlining (and interprocedural optimizations, which poses the same sort of problem), does anyone have suggestions on how to profile such programs?

    1. Re:inlining makes profiling c++ code difficult by Anonymous Coward · · Score: 0

      My experience is that with the current generation of systems, inlining hurts more than it helps. I've run extensive tests on various optimizations on x86 hardware. What I've discovered is that keeping as much code in the processor cache is the most important optimization that you can make. This precludes old (and out of date!) standbys such as loop unrolling, and function inlining. Branch prediction is so good on modern processors that loop unrolling does little more than fill up the cache.

    2. Re:inlining makes profiling c++ code difficult by PuntaConejo · · Score: 1

      While I would agree that overzealous inlining can be harmful, there are many cases where it is essential for good performance Code using the STL containers can run much more slowly when inlining is turned off. Moderation is the key.

      To characterize loop unrolling and function inlining as out-of-date is disingenuous. While you are right
      that they can increase code size, each has its rightful place.

  58. Re:It is not profiling useless... it's multithread by Anonymous Coward · · Score: 0

    From a QA perspective, multi-threading has the same problems as global variables -- too much coupling, too much exposed data. If you are running a multiprocessor system and the program can be parallelized, then multithreading might be worth the trouble. But for most applications it causes more trouble than its worth.

  59. No, take a look at FunctionCheck by Pornosonic · · Score: 2, Informative
    Take a look at FunctionCheck

    Five bucks says that this server is slashdot'ed within the hour, so you may have more success with the less descriptive SourceForge project page, indicates that the project is not dead, as the homepage says.

    I discovered this program when I was optimizing some code I wrote to multiply sparse matrices. By the time I had gotten it 100x faster than the initial code, gprof had lost all semblance of granularity and was giving me obviously bogus results. The problem is that such things as cache performance (i.e. optimizing for cache hits) were now heavily affecting the profile and gprof could not figure such things out. FunctionCheck works much better than gprof and actually generates accurate profile information under high-stress situations.

    From the homepage (all grammatical errors theirs):

    "I created FunctionCheck because the well known profiler gprof have some limitations:

    • it is not possible to change the profile data file name
    • multi-threads / multi-processes is not supported
    • time spend in non-profiled functions is discarded
    • you can't control the way profile is made
    • memory profile is not managed
    For all these limitations, and by the fact that I discovered a new gcc feature called -finstrument-functions, I decided to write my own profiler.

    My approach is simple: I add (small) treatments at each enter and exit of all the functions of the profiled program. It allows me to compute many information:

    • the current call-stack
    • the time at each action, to compute elapsed times in functions
    • process PID / thread ID, to manage multi-threads / multi-processes
    • number of calls to functions
    • ...
    With these information, I can generate profile data files (for each thread / process), which describes all the statistics (at function level) for the program execution."

    Try it out and please contribute some source code.

  60. the 'worst' algorithm by Kenard · · Score: 1

    and thus are primarily limited by user response times, which are many orders of magnitude longer than even the worst algorithm.
    User response time is many times faster then the time it takes for a function to return that's stuck in an infinet loop.

    --
    (appended to the end of comments you post)
  61. Here's my profiler. by Anonymous Coward · · Score: 0

    Here's a call-graph profiler I wrote a while ago. It's rough and ready, but it does the trick, and works with both multiple threads and shared libraries.

    x86 Linux only. README.txt included, if it breaks keep all the pieces.

    http://homepages.ihug.co.nz/~suckfish/scgprofile /

    The output is a series of records vaguely like the gprof call graph:

    calling_functions
    function_name
    called_functions

    The numbers are numbers of samples.

  62. Re:write event driven programs; threads for CPU wo by sesquiped · · Score: 1

    Ok, I agree that non-threaded code can be easier to understand than threaded code, and can usually run faster.

    However, I'd like to hear what you'd do in a situation like this: You have a network server that has to respond to incoming packets within something like 50 ms (it's for a multiplayer action game). The server also needs to keep track of player information in a database, so it uses something like the mysql C client library. But then it has to block on a mysql_query or mysql_insert, because that library doesn't provide any way to do things asynchronously (this maybe have changed with newer versions; I looked at mysql a few years ago). Or what about DNS resolution? Or any other blocking event other than plain IO through an fd?

    One solution might be to fork off a process doing the mysql, and have it communicate with the parent through pipes/sockets. But you have to invent a small, one-time protocol for each of these you do. And did I mention this has to run on Windows too, whose IPC sucks?

  63. gcj and JVMPI by KIngo · · Score: 1

    Well, if gcj's JVMPI becomes fully usable, maybe we could us a tool like JProfiler for natively compiled Java code. That would be great.

  64. Re:There is no question that profiling is necessar by Anonymous Coward · · Score: 0

    You rock.

  65. Yes it does! by Anonymous+Brave+Guy · · Score: 2
    Using appropriate data structures and algorithms counts, and making correct software counts even more, but worrying about how many cycles one instruction takes versus another is a serious misdirection of effort on modern machines!

    But he didn't say that... He said that programmers should know where to invest the effort, and take an interest in creating efficient code. That means, first and foremost, exactly what you just said: you have to be smart about your DS&As, aware of what you're writing and not pointlessly lazy when coding. It doesn't mean, and wasn't claimed to mean, that you have to micro-optimise everything at the assembler level.

    --
    If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
  66. You're so right by Anonymous+Brave+Guy · · Score: 2
    The problem is not that certain tools have issues; but rather that today's programmers have no interest in creating efficient code. [...] Today's advances STILL keep up with Moore's law, still make up for their lack of skill. However, when one looks at what is actually performed with all that power, one tends to question why we are paying so much, for so little. [...] We don't need profilers, we need coders have have that tacit knowledge of what really counts, where they should put real effort.

    I couldn't agree more. Sadly, the fact that almost everyone replying to your post thinks it is advocating premature optimisation at the level of assembly-level tweaks makes your point all too well.

    --
    If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
  67. This will cost me karma... by tlambert · · Score: 2, Insightful

    Don't use threads.

    The problem you are complaining about profiling having is that it can't profile threaded programs. Don't write threaded programs, and the problem is solved.

    Frankly, I've always considered threading useful for only a few situations:

    o When you have an SMP system, and you need to scale your applicaiton to multiple CPUs so that you can throw hardware at the problem instead of solving it the right way

    o When you have programmers who can't write finite state automata, because they don't understand computer science, and should really be asking "Would you like fries with that?" somewhere, instead of cranking out code

    o When your OS doesn't support async I/O, and you need to interleave your I/O in order to achieve better virtual concurrency

    Other than those situations, threads don't make a lot of sense: you have all this extra context switching overhead, and you have all sorts of other problems -- like an iniability to reasonably profile the code with a statistical profiler.

    OK... Whew! Boy do I feel better! 8-).

    Statistically examining the PC, unless it's done on a per thread basis, is just a waste of time in threaded programs.

    If you want to solve the profiling problem for threaded programs, then you need to go to non-statistical profiling. This requires compiler support. The compiler needs to call a profile_enter and profile_exit for each function, with the thread ID as one of the arguments. THis lets you create an arc-list per thread ID, and seperately deal with the profiling, as if you has written the threads as seperate programs. It also catches out inter-thread stalls.

    -- Terry

    1. Re:This will cost me karma... by Anonymous Coward · · Score: 0
      I find it interesting that you think that "programmers" who should be asking "Do you want fries with that?" should be playing around with MT-code.

      For me, if a programmer can't walk, I'm not about to ask him to walk, chew gum, and breathe all at the same time.

    2. Re:This will cost me karma... by Anonymous Coward · · Score: 0
      Don't use threads.

      The problem you are complaining about profiling having is that it can't profile threaded programs. Don't write threaded programs, and the problem is solved.
      Equivalent to saying: don't use SMP with cpus > 4. Your favorite free OS (such as FreeBSD or linux) cannot use more than 4 cpus efficiently. The solution to this is to use less than 4 cpus at a time. This is ideal.
      When you have an SMP system, and you need to scale your applicaiton to multiple CPUs so that you can throw hardware at the problem instead of solving it the right way
      Does the right way involve throwing out all those extra 'evil' cpus and doing it with one cpu? If you have the hardware, the software should use it.
      When you have programmers who can't write finite state automata, because they don't understand computer science, and should really be asking "Would you like fries with that?" somewhere, instead of cranking out code
      Finite state machines...I suppose you are thinking one finite state machine = one thread, so anyone who uses multiple threads doesn't understand the 'holy' solution of automata? Finite state machines can have concurrant actions can't they? E.g VHDL works in this way normally. Concurrancy = multiple threads.
      When your OS doesn't support async I/O, and you need to interleave your I/O in order to achieve better virtual concurrency
      Or when your OS supports async I/O and SMP and you want multiple I/O to complete in parallel. Same discussion on freebsd-arch: multiple interrupt threads are 'not useful' except that once FreeBSD moves to big iron that does lots of parallel I/O, the performance will suck without it.
      Other than those situations, threads don't make a lot of sense: you have all this extra context switching overhead, and you have all sorts of other problems -- like an iniability to reasonably profile the code with a statistical profiler.

      OK... Whew! Boy do I feel better! 8-).

      Statistically examining the PC, unless it's done on a per thread basis, is just a waste of time in threaded programs.
      Obviously, there won't be much speedups on a 4-way SMP machine if your program uses 8,000 threads. Moderation is important. You can say that fork() has a similar problem of overhead so i guess we should all be running machines that do everything in one big address space right? Sounds like MS-DOS. Statistical profilers can work, all they need to do is work out the statistics for each thread. Seems logical and straightforward to me: since gprof was built with the assumpton that there was only one thread per process basically.
      If you want to solve the profiling problem for threaded programs, then you need to go to non-statistical profiling. This requires compiler support. The compiler needs to call a profile_enter and profile_exit for each function, with the thread ID as one of the arguments. THis lets you create an arc-list per thread ID, and seperately deal with the profiling, as if you has written the threads as seperate programs. It also catches out inter-thread stalls.
      This is because, from a profiling point of view, threads ARE different programs. they are sepearate and independent sequences of control. Just because some outdated profiler doesn't understand this doesn't make them useless.
    3. Re:This will cost me karma... by dvdeug · · Score: 2

      you need to scale your applicaiton to multiple CPUs so that you can throw hardware at the problem instead of solving it the right way

      "Solving it the right way"? If you know how to solve the travelling salesman problem, or chess, or simulate the world's weather without throwing hardware at the problem, you really ought to publish it for the good of mankind.

      threads don't make a lot of sense

      Some problems are conceptually parellel; it almost always easist to write a procedure in a way that mirrors the way it's conceptualized.

      you have all this extra context switching overhead,

      So your multitasking system does 1001 context switches a millisecond rather than 1000. Woo hoo.

    4. Re:This will cost me karma... by Znork · · Score: 3, Insightful

      "Some problems are conceptually parellel; it almost always easist to write a procedure in a way that mirrors the way it's conceptualized."

      In that case... fork and use IPC. It's not substantially more expensive and you wont have to ensure your parallel code is thread safe.

    5. Re:This will cost me karma... by dvdeug · · Score: 2

      In that case... fork and use IPC. It's not substantially more expensive and you wont have to ensure your parallel code is thread safe.

      But then you're forced to serialize and deserialize all the data you need to share.

    6. Re:This will cost me karma... by Anonymous+Brave+Guy · · Score: 2
      o When you have programmers who can't write finite state automata, because they don't understand computer science, and should really be asking "Would you like fries with that?" somewhere, instead of cranking out code

      I have a formal background in CS, I'm well aware of how to use FSAs, and I'm a professional software developer, and yet I disagree with this argument. One thing I've learned is that if a tool is available and its purpose matches your need, it's generally a better solution to use that tool than to reinvent the wheel.

      I've worked on several multithreaded systems, some small scale, some enormous. While it would theoretically have been possible to rewrite the multithreaded code as a FSA, it would surely have led to a maintenance handicap and an increased bug count, in exchange for -- possibly -- a tiny increase in performance, and even that is not guaranteed by any means. Why spend hours writing a multithreading system of my own when there's a tried and tested one already there for me to use?

      --
      If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
  68. Ah, hokey by Anonymous+Brave+Guy · · Score: 2
    Admittedly, TeX was programmed by Knuth so the ordinary laws of nature probably don't apply to it :).

    Bah yourself. Who's this Knuth guy, and what the hell does he know about efficient programming, anyway?

    --
    If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
  69. Re:write event driven programs; threads for CPU wo by Anonymous Coward · · Score: 0


    Read again what he said : _minimize_ thread use doesn't mean _don't_ use thread. Obviously there are case when you can't avoid using threads (like the one you described), but most of the time, it's possible to poll on resources (poll() on UNIX, select() on BSD, WaitForMultipleObjects() on Win32), and it should be done so to save on system overhead.

  70. No, he's right by Anonymous+Brave+Guy · · Score: 4, Insightful
    Why waste your time trying to write efficient code from the start? It's much better to write easily unstandable, easily maintained, quickly written and minimal bug code.

    Why are these mutually exclusive? There's efficient and there's optimised, and one is a much easier subset of the other.

    He's not claiming that everyone should hand-optimise from the word go. He's saying programmers should have a basic knowledge of their craft. It doesn't take much extra effort to use an efficient sorting algorithm or store data in a fast look-up structure, rather than writing a naff, hand-crafted shuffle sort and using arrays for everything whether they're appropriate or not. And yet, through ignorance or plain laziness, most programmers in most languages take the latter approach. (If you've never seen any of the source code for big name applications/OSes, trust me, it's scary.)

    Similarly, it is just careless to pass large structures by value unnecessarily in a language that has reference semantics. You have to know the basics of what is efficient use of your tools of choice if you want to write good code, and the old Moore's Law excuse is just a cover for laziness and failure to do the requisite amount of homework.

    Note that, very importantly, none of these things requires more than a small effort. They certainly don't compromise maintainability, bug count or any other relevant metrics, and a competent programmer (if you can find one) will take these things in his stride, and still be faster than the others.

    I used WordPerfect 5.0 (or whatever it was) on a dual 360K 5.25" floppy disk drive machine. Plain blue text screen only. I have to say, I *much* prefer Word XP.

    Interesting... We have just acquired a new P4/2.2GHz with 512MB RAM and running WinXP as a development machine at work. You know what? It's way, way slower than the 1.4GHz P4 running 2000 we already had. And that in turn is way slower than the 1GHz P3 running NT4. This is not subjective, it is based on obvious, objective measures. For example, my new machine (the fastest of the above) sometimes takes 3-4 minutes to act on an OK'd dialog in Control Panel. The NT4 box reacts instantly when you configure the equivalent options. Something is wrong at this point, and I'm betting it's a combination of code bloat and feature creep.

    --
    If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
    1. Re:No, he's right by Anonymous Coward · · Score: 0

      I've heard people say that Windows XP boots the fastest of all the Windows. It sure isn't as fast as Win3.1 or Win2.0, even on hardware two orders of magnitude faster than the old hardware.

    2. Re:No, he's right by Anonymous+Brave+Guy · · Score: 2
      I've heard people say that Windows XP boots the fastest of all the Windows.

      It does boot pretty quickly, and there are lots of things to like about its startup process compared to previous versions. But it's spoiled by a few very big screw-ups, one of which is the absurd delays of several minutes that seem to turn up at random while configuring your system. (Another is the bugs in setting up multiple user accounts, which cost me nearly a day of set-up time as I discovered that I'd used the wrong name on the very first screen, and couldn't properly rename that user account or copy its settings to another one later -- if you're ever setting up WinXP, watch out for this one.)

      It sure isn't as fast as Win3.1 or Win2.0, even on hardware two orders of magnitude faster than the old hardware.

      That's not really a fair test, since the versions you mention weren't actually operating systems, and had far less to do while booting.

      --
      If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
  71. Pratical use of GProf by oezi · · Score: 1

    gprof is maybe not the most impressive tool to use, but it's quite useful. At a IA64 course at university [German, sorry] we used gprof to identify the bottlenecks in the c-code of the xvid-codec. Then we assembler-optimized like mad and got quite a nice speed-up.

    Result can be found in our wiki:

    Pre-Optimization
    Post-Optimization
    Without gprof we would have been lost... our IA64 wiki
  72. Linux Threads and Java by wwi · · Score: 1
    This discussion has included some
    comments about the poor implementation of
    threads in Linux. Other writers suggest
    avoiding threads, if possible. Note
    that Java is nothing but threads. Any
    Java program is running 4-6 threads (depending
    on the JRE) right out of the box.

    Where I work, we have had
    severe problems getting Java programs to
    work correctly on Linux. The IBM Java
    support team has shared our frustration.
    Maybe IBM's new thread implementation is
    needed, just to get Linux, Java (and
    thread users in general) working correctly
    in an enterprise environment. After that
    is working, then we can see about improving
    other areas like performance.

  73. tsprof: process profiling on Linux/x86 by jreiser · · Score: 2, Informative

    See http://www.BitWagon.com/tsprof/tsprof.html for info on a process profiler that uses hardware performance counters (with no recompile and no relink) and gives both interactive and text output in tree and flat modes.

  74. Re:write event driven programs; threads for CPU wo by sesquiped · · Score: 1

    I suppose I'm already following that advice: I'm using a combined thread/main-loop design in the current server. There are a fixed, small, number of threads for things that are time-criticial or might block, and also a loop running in the main thread that calls a bunch of registered functions every once in a while, for low-priority tasks.

    The problem is this: once you start using threads, even a few, you have to start protecting your data structures with mutexes and probably use other synchronization methods as well. If your code already has mutexes around shared structures, then there's little harm in adding a few extra threads. The big benefit I can see from minimizing threading is to get it down to a single thread, so that you can eliminate all of the synchronization overhead.

  75. Multithreaded programs by Anonymous Coward · · Score: 0
    multithreaded programs are almost always interactive

    Huh? You'd better hope your boss ain't reading this.

    And your attempt at keeping folks from running commands you think they shouldn't is pretty damn feeble. How do you decide someone has typed in your 'eteled' command? Maybe a strncasecmp() or similar? So if I do a "strings -a" on your executable I'd see a weird string pop out like "eteled"? And maybe I'd wonder why it was stuck in with all the strings called "select" and "update"?

    OOOH, you so smart.

  76. Linux threads suck by Anonymous Coward · · Score: 0
    Until Linux fixes its multithread problems, it will remain a step below Solaris and even Windows server platforms for some uses.

    Has anyone ever tried to run xxgdb and gdb against a threaded program?

    Has anyone tried calling fork1() from within one thread of a multithreaded process? One third of your threads shouldn't disappear, one third shouldn't wind up hung, and one third shouldn't get brain-damaging memory corruption.

    I've actually reached the point of telling folks that come to me with MT requirements for Linux processes to pay the money and run Solaris. It's cheaper than paying me to get Linux threads working correctly.

  77. multiply-add instruction in C99 standard by Anonymous Coward · · Score: 0

    Here you go: fma()

  78. Profiling will always be useful by freddoh · · Score: 1

    Saying profiling is useless is equivalent to saying that algorithmic complexity doesn't have to be studied. This is absurd.
    If for example your profiler says your function foo() is executed 100 times with a data of size 10 and 10000 times with a data of size 20, you have a serious algorithmic complexity problem. If some of them can (and should) be handled before hand, a profiler is very useful to handle those that weren't.
    Algorithmic complexity is independant of computer language and of CPU speed, therefore profilers will always be useful as long as algorithmic is used by computer languages, so for quite some time still ;)

  79. Threading... by Znork · · Score: 2

    I wouldnt say that gprof is useless... threading, however, comes very close to it.

    Threading is useful in the instance where you have an application that needs to scale with SMP and which you cannot, for whatever reason, fork. But the accompanying pain of being forced to pay extremely close attention and mutex lock the code all over makes it not worth it for most situations.

    Use fork. Use other IPC methods if necessary. But dont thread or you'll spend an order of magnitude more time debugging.

    1. Re:Threading... by Anonymous Coward · · Score: 0
      Threading is useful in the instance where you have an application that needs to scale with SMP and which you cannot, for whatever reason, fork. But the accompanying pain of being forced to pay extremely close attention and mutex lock the code all over makes it not worth it for most situations.

      Multitasking is useful in the instance where you have several applications that you can execute concurrantly by using times where one is waiting for I/O to give another CPU time. But the accompanying pain in having to do memory protection, paging, and virtual memory makes it just not worth it for most situations. Similar argument, but most people will say its bullshit. Get with the times.

      Use fork. Use other IPC methods if necessary. But dont thread or you'll spend an order of magnitude more time debugging.

      News flash: threads are lower overhead than fork. When you create a new thread, all you basically do is create a new stack, a new structure to store context and a few other things. When you create a new process, you do all of this plus create a new memory protection domain. In other words, fork() always costs more, no matter how optimized it is.

    2. Re:Threading... by Znork · · Score: 2

      News flash: You hit right on without knowing it. In your comparison, threading equals multitasking without memory protection. It Just Doesnt Work Very Well.

      The overhead of a fork on an OS that does copy on write for the forking is minimal. And it outweighs the cost of dealing with threads by far. Fork costs more, but mutexes cost as well, usually to the extent that you lose the advantage of having multiple concurrent (or not so concurrent, after all) threads of execution.

  80. "Better" UI depends on your perspective by Anonymous+Brave+Guy · · Score: 2
    There's a lot to be said for on-the-fly spellchecking. Having little red wavy lines is a much better interface for the "second set of eyes" that a spellchecker is.

    That depends on your point of view. Personally, I write lots of technical documents, where every other word (ish) isn't in the dictionary. That "better interface" makes my screen unreadable, since it's littered with red. On top of that, I usually spell correctly in the first place, and look words up in a dictionary as I go along if I'm not sure. Spell checkers rarely have to correct genuine mistakes in my documents. So personally, I'd much rather see that feature done away with and have the performance back, rather than waiting for Word to catch up as I type, as I had to ten years ago. If it's useful for others, by all means have it as an option, but don't call it "better" in a blanket statement.

    --
    If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
    1. Re:"Better" UI depends on your perspective by Planesdragon · · Score: 1

      Personally, I write lots of technical documents, where every other word (ish) isn't in the dictionary. That "better interface" makes my screen unreadable, since it's littered with red.

      Then *ADD THEM.* Check the spelling, then right click the word and select "add." You can even have Word autocorrect the new word when you type it wrong.

      Besides which, you're not complaining about the UI; you're complaining about the spellchecker itself. The UI of the spellchecker has improved, from a box that takes over the entire program to a little red wavy line that shows up in the background.

      And if you want to turn off the spellchecker in word, pull up the options menu (It's on the "Tools" menu), select the "Spelling & Grammar" tab, and then unlick the "Check Spelling as you Type" and/or the "Check Grammar as you type" options.

      If, by chance, you want to run the spellchecker afterwards, you can still use the toolbar, menu command, or F7 key.

      The UI for the spellchecker in Word XP *is* better than Word 2.0, by any measure you care to use with it. Sure, it's still a spellchecker that messes up a lot and doesn't know your jargon out of the box, but it *does* work better.

      (And, since it's been copied so much, if I had an alternate word processor that correctly marked "em-dashes" [OpenOffice and Abiword don't] and used the "menu" key based on the cursor and not the mouse [Wordperfect, last time I checked], I'd gladly switch to tick MS off.)

    2. Re:"Better" UI depends on your perspective by Anonymous+Brave+Guy · · Score: 2

      I'm well aware of how to switch off all the wavy-line UI, thanks; it's one of the first steps I take on any new installation of Office that I use (right after disabling the "irritate me by changing my menus every now and then" option).

      However, I still dispute the claim that the spelling UI is better. It sucks for my uses, it gets in my way, and it provides no more power than the traditional version. For some people, it may be a nicer UI to use, and I have no problem with that statement, but it's not "better" in any meaningful, general sense.

      --
      If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
    3. Re:"Better" UI depends on your perspective by Planesdragon · · Score: 1

      and it provides no more power than the traditional version.

      Sure it does. The Word 2.0 spellcheck freezes the entire program until you "cancel" out of it--and then you have to start all over again.

      Post 95, the spellcheck allows you to stop, re-write a sentence that you messed up, and continue reviewing. Either with the squiggly lines or the "traditional" menu.

      Now, please define an objective "better" measure if you're going to refute this. My measure: "is it easier to use for its intended purpose?"

      And since being able to stop, make an edit, and then continue "spellchecking" makes reviewing easier, and on-the-fly spellchecking makes writing easier, my answer is "yes."

      Got a good objective measure of how the different interface isn't "better" that's something other than "It's different from the old"? There IS a difference between "better" and "worth re-learning", y'know.

    4. Re:"Better" UI depends on your perspective by Anonymous+Brave+Guy · · Score: 2
      Now, please define an objective "better" measure if you're going to refute this. My measure: "is it easier to use for its intended purpose?"

      Sorry, but objective definition is your job if you're going to make the claim. Now you need to define "intended purpose". Personally, I could quite happily use the spell checker as it was, because I ran it once at the end of my work before saving the document. None of your squiggly lines, continuing edits and such have any value to me. Am I not using it for its intended purpose?

      To get back on-topic... All those extras -- particularly the on-the-fly checking -- take processor time. The more cavalier the attitude of those who add such features to writing efficient code, the more bloated it becomes.

      --
      If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
    5. Re:"Better" UI depends on your perspective by Planesdragon · · Score: 2

      Sorry, but objective definition is your job if you're going to make the claim

      Bullocks. This isn't a friggin courtroom, it's /. An argument by "not my job" ain't going to get anything done unless you get the moderators (people of authority) to come over and say what's what. Come up with a measurement of your own, or use mine, but don't bitch about it being "my job" to come up with one. (Sheesh)

      Now you need to define "intended purpose". Personally, I could quite happily use the spell checker as it was, because I ran it once at the end of my work before saving the document. None of your squiggly lines, continuing edits and such have any value to me. Am I not using it for its intended purpose?

      The intended purpose of the program--Microsoft Word--is to create documents. Be they business letters, homework assignments, or full length books. It's not a layout system, like Quark. It's the sort of super-feature-rich typewriter that got computers on everyone's desks in corporate america.

      You, by your own omission, have poor writing habits. (Run spellcheck *once?* No revisions? WHAT THE HELL DO YOU NEED WORD FOR?!) But the indivudal habits, poor or worse (my writing habits are pretty bad sometimes, myself), are why we need an objective measuremnet.

      Ok, writing habits aside, the "indended purpose" is defined. And it is "easier to use" for that purpose--or maybe "more useful" is a better way to say it.

      And as for "extras"... I just spent two hours writing, with the lastest version of Word, with Winamp in the background, and not *once* did I have to wait for the computer to catch up to me. When I was using Word 2.0 years ago, I had to wait for it sometimes--and this was on a 486/33, not exactly the sort of luxurious computer that Word 2.0 assumed it would have. My current computer has some 21 times the raw clock speed of that 486--I would certainly HOPE that some of that could be used in helping me write, rather that just running idle.

    6. Re:"Better" UI depends on your perspective by Anonymous+Brave+Guy · · Score: 2
      Come up with a measurement of your own, or use mine, but don't bitch about it being "my job" to come up with one.

      Would you please read the title of this subthread, and then go back and read my comments in it? My argument throughout has been that there is no absolute definition for "better UI", as it will always depend somewhat on a specific user's needs. Smart-alec UI can be helpful for some, and I have nothing against it, but it needs to be done carefully and always made optional. (And as for your two hours in Word with Winamp on, I'm afraid that's hardly a heavy-duty system. Try running four simultaneous versions of Visual Studio for a living, and see how much the similar Intellisense bells and whistles slow down even today's fast machines.)

      The intended purpose of the program--Microsoft Word--is to create documents. Be they business letters, homework assignments, or full length books.

      Yes, and that's what I use it for.

      It's not a layout system, like Quark.

      Maybe not, but MS sure as hell have marketed it on such features in the past. BTW, re your full length books, we used Word at a place I used to work to write formal acceptance test schedules for large software. Nothing clever in the formatting, a few headings and automatic numbering, that's all. Word spewed at about the 300 page mark. You know what Microsoft support told us, when we contacted them on our super-expensive corporate support line? "Word isn't meant for such long documents; you need a DTP package for that."

      You, by your own omission, have poor writing habits. (Run spellcheck *once?* No revisions? WHAT THE HELL DO YOU NEED WORD FOR?!)

      I have excellent writing habits. Amongst other things, I learned to spell at school, and don't rely on the services of a spelling checker to tell me whether I'm right or wrong. Spelling checkers rarely find a problem with my work (though the grammar checker is constantly telling me off because I write in English and it doesn't understand). As a result, I quite happily run my documents through the spelling checker once at the end of each editing session to pick up the odd mistake I've missed, and I feel no need to do any more. Hell, I write all my serious documents in LaTeX anyway, where I don't even have a spelling checker available. And you know what? They don't have spelling errors in them either, because I also proofread my work using good old eyes and brain power.

      I think this subthread has pretty much run its course at this point. You argued that it was worthwhile to have all the bells and whistles. I said that was subjective, and they cost too much for some people. Let's agree to disagree.

      --
      If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
    7. Re:"Better" UI depends on your perspective by Planesdragon · · Score: 1

      My argument throughout has been that there is no absolute definition for "better UI", as it will always depend somewhat on a specific user's needs.

      "Better" is always subjective, and there are few absolutes. Hence why my argument was for an *objective* measurement.

      And as for your two hours in Word with Winamp on, I'm afraid that's hardly a heavy-duty system. Try running four simultaneous versions of Visual Studio for a living, and see how much the similar Intellisense bells and whistles slow down even today's fast machines.

      Who brought up Visual Studio? We were discussing MS Word's spellcheck.

      The bloatware of running several of MS's "heavy" apps in undeniable. But I suspect you'd get better results by switching to multiple PCs, or (*gasp*) running as few instances of the same program as possible.

      Maybe not, but MS sure as hell have marketed it on such features in the past. BTW, re your full length books, we used Word at a place I used to work to write formal acceptance test schedules for large software. Nothing clever in the formatting, a few headings and automatic numbering, that's all. Word spewed at about the 300 page mark. You know what Microsoft support told us, when we contacted them on our super-expensive corporate support line? "Word isn't meant for such long documents; you need a DTP package for that."

      How large (size or word count) was the document? "pages" is about the worst measure out there for file size.

      As for your anecdote--I just took said book, pasted it thrice into a file (so it's about 748 pages, or 395,083 words) and it was able to handle the entire thing rather easilly. Seems like the file size limit has been raised in the newest version.

      I have excellent writing habits. Amongst other things, I learned to spell at school, and don't rely on the services of a spelling checker to tell me whether I'm right or wrong. Spelling checkers rarely find a problem with my work (though the grammar checker is constantly telling me off because I write in English and it doesn't understand). As a result, I quite happily run my documents through the spelling checker once at the end of each editing session to pick up the odd mistake I've missed, and I feel no need to do any more. Hell, I write all my serious documents in LaTeX anyway, where I don't even have a spelling checker available. And you know what? They don't have spelling errors in them either, because I also proofread my work using good old eyes and brain power.

      "one spellcheck through" and "no revisions" are not what I'd call good writing habits. But as long as your work gets done, have a blast.

      I think this subthread has pretty much run its course at this point. You argued that it was worthwhile to have all the bells and whistles. I said that was subjective, and they cost too much for some people. Let's agree to disagree.

      Acutually, we're not technically arguing.

      I have not, once, said that it's always better to have the bells and whistles; for some people, it's not worth the effort to learn a new interface. But if we remove intertia and take an objective look at the interface, then I think the only conclusion is "yes, the UI is better in the latest version of Word than it was in 2.0."

      It's like DVORAK being objectively a better layout; I know it's more efficient, and I know that my typing speed would probably improve if I switched. But it's not enticing enough for me to do so, so I follow the inertia and use the QWERTY keyboard I'm used to.

  81. KDE vs. Win98 performance by Anonymous+Brave+Guy · · Score: 2
    Even my 1.5Ghz Athlon-XP is slower running KDE 3.x (or any version of gnome for that matter) than my old 300Mhz PII was running Win98.

    Thanks for that information. I'm about to upgrade my trusty PII/350 running Win98 to a nice, new top-of-the-range custom-built beastie. Well, it's been four years, and it was my birthday last week. :-)

    I'd been considering installing Linux as an alternative to MS stuff, since I now object enough to the nature of Microsoft's attitudes to make the effort to switch. In the light of your information, I think I'll just install Win2K instead.

    --
    If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
    1. Re:KDE vs. Win98 performance by dodobh · · Score: 2

      Nah, just use a lighter WM like Windowmaker, or fluxbox, or.....
      You can use whatever you like, not just the latest KDE or whatever.
      (Oh, and don't bother to upgrade your hardware, I am writing this on a Celeron 266MHz, 64 megs of RAM, and this is quite fast with KDE, Mozilla and Netscape running.).

      --
      I can throw myself at the ground, and miss.
  82. Re: premature opitimization quote by alangmead · · Score: 1
    I usually see the premature optimization quote attributed to C.A.R. Hoare, not Knuth. According to this web page of quotes it is often misattributed because Knuth quoted Hoare on this issue.

    I agree with the rant, though. I have developed and maintained some significant sized threaded applications, and I loathe to do it again unless necessary.

  83. Oracle not multithreaded Re:gprof far from useless by Anonymous Coward · · Score: 0


    > What about databases like Oracle, MS SQL Server,
    > and so on? They're internally multithreaded,

    Oracle, at least the last time I played with it
    was NOT multithreaded by default. There was a
    multithreaded config option, but Oracle recommended
    that you did not use it.

  84. Inefficient code and brownouts by slackware-dude · · Score: 1

    I remember reading somewhere recently that computers use about 17 percent of power from the electricity supply grid. The latest PCs need 300W supplies, which helps neither the environment nor availability and cost during times of peak load. It's even worse because of the extra demand for AC, not to mention fan noise. Code that lots of people run should avoid excess bloat and inefficiency not just for improving end-user experience as measured by things such as response times.

  85. Re:There is no question that profiling is necessar by chris_7d0h · · Score: 1
    There are a lot of jobs that don't need anything approaching "good" performance (batch reports--I need a web usage report every morning on my desk/in my inbox--where the quick-and-dirty multipass solution that takes 3 hours to run can be scheduled at midnight, and the programmer can then do another project with big ROI instead of spending time writing a faster solution that takes only seconds to run) are one extremely common example of this (as is other batch processing).

    Hmm, it depends on how you pay for hardware (cpu-cycles for ex.), now doesn't it?
    Take IBM's scheme for Mainframes where you pay for what you use... Here I can certainly see long term gain / reduction of costs / benefit for a company optimizing a batch job taking 3 hours into running in a few seconds. Often these bottle necks don't cost astronomic sums, but "letting a ksh script loop in OMVS" certainly will.

    --
    In a society that believes in nothing, fear becomes the only agenda ~ Bill Durodié
  86. C has multiply+add by Anonymous Coward · · Score: 0

    C is rather common, and it has the "floating multiply-add" functions fma, fmaf and fmal.