Slashdot Mirror


User: windwalkr

windwalkr's activity in the archive.

Stories
0
Comments
105
First seen
Last seen
Profile
(view on slashdot.org)

Comments · 105

  1. Re:It all depends on Project Aims For 5x Increase In Python Performance · · Score: 1

    I accept most of what you're saying, but this has got to be wrong. The only way this works is if the standard specifies that vector elements must be stored in a single, contiguous block of memory. Is this really the case?

    http://groups.google.com/group/comp.lang.c++/msg/408d058c256d7699

    Over and above that, what if the elements are objects? Sounds evil to me.

    Again, the choice of std::vector or C array has no bearing on whether memset() is going to work. If the elements are not POD, then you're going to have repercussions in either case. If they are POD, it's going to work fine in either case. If you're looking at non-POD objects, or just feel like doing thing "the C++ way", then STL offers reserve(), clear(), fill() and so on which can be used to do the above cleanly in all cases.

    Well, judge for yourself if I'm any good, but I do program in C++, even though these days I try to avoid it.

    I'm not judging your capability. Rather, I'm noting that familiarity with a particular language or usage pattern has a direct bearing on how well you're going to be able to write optimal code in that language.

    My sense is that C++ is an order of magnitude more complex, and I've never personally met anyone that I felt had mastery of the language.

    I know one or two who have mastered the whole language, and honestly, these guys aren't the best programmers i know. I know many who have mastered the areas that I personally feel are relevant in day-to-day programming including lower-level optimization, and these guys are good. I know many who have a light working knowledge of the language, can write and debug most application code, and who have probably never considered the type of stuff we're discussing here. All of these guys are good in their way, but if you don't do something regularly, you don't become proficient at it. Certainly, mastering the entire language is not necessary for the kind of optimization we're talking about here.

    I'm sure that such people exist, but the fact that I don't encounter them suggests to me that they are relatively rare--and that using a lot of C++ on a project isn't a good idea.

    I'd say this is industry specific. Certainly in my industry, anyone who can't do this kind of thing would be considered a junior. Again, this may be completely different in other industries, based on what is important in that specific industry.

    For these cases, though, I'm probably just using Python instead. Where I'm using C++, it pretty much has to be as fast as C.

    You may have missed my point. I'm not suggesting that C++ can't be as fast as C, but rather am referring to the usual 90%/10% rule when optimizing. There's little point in worrying about the 90% of the code/structure which takes 10% of the time, at least until you've dealt with the 10% of the code/structure which takes 90% of the time. If you're finding that STL is introducing overhead into your code's critical path then yes, you'll want to look at alternatives - either avoiding STL for this part of the code, or consider restructuring your structures so that this doesn't occur.

    When I started my project, I thought that there wouldn't be any difficultly getting the C++ to be as fast as C, but after living with it for a while, I'm seriously considering going back to straight C (albeit as much for its relative simplicity as for the speed).

    And that's fair enough. I'm not telling you how to program, and only you can determine what's the best option for you. My point is that just because the STL didn't work out for you personally, doesn't reflect on STL's capabilities in a general sense.

  2. Re:It all depends on Project Aims For 5x Increase In Python Performance · · Score: 1

    Hmm--yes, that sounds right. Still, though, if I have O(n) arrays all the same size, C arrays can remember that in O(1) space, whereas STL would apparently need O(n) space (storing the size within each vector).

    Quite so. I would contend that this is not a problem for a normal workload (you're probably looking at 8 bytes overhead per vector as compared to a C array, plus maybe a small amount of overhead from the malloc library) but a pathological case could hit this (ie. a scenario where you have truly massive numbers of near-empty vectors.) My point here is not that a vector has identical performance characteristics to a C array, but that in any reasonable usage the real-world performance will be equivalent.

    I was thinking of the time it takes to malloc and free the storage, which would (I would think) be much larger than the time involved in allocating on the stack, or perhaps out of static storage, in cases where this is possible.

    Again, heavily dependent on usage rather than choice of array types. If allocation and deallocation incurs a large cost in your code, then obviously you're going to want to optimize for this specific concern. The first step would be to preallocate wherever possible, regardless of whether you are using a vector or a C array. Also of note - if you're really able to keep your arrays on the stack, then you don't have enough of them around that the 8-bytes overhead above will be noticeable, unless you're talking about a heavily multi-dimensional structure. If you're working with something multi-dimensional, then fair enough - vector may not be the best choice for you.

    Yes, okay. You're saying that if you forgo all of the vector-ish features of a vector, just allocate it initially and then index into it, there should be no additional relative cost. That does sound true.

    I'm saying that you have options, and that some options lean toward high performance, and some options lean towards flexibility or other benefits at the detriment of performance. The choice of using a vector does not inherently force you to sacrifice performance- you can choose what techniques are useful to you and which would be too expensive for your particular case.

    Although maybe there's still a bit for translating between "pointer to vector" and "pointer to vector's actual array".

    This is heavily dependent on your code, your compiler, and your data structures. It's feasible that a vector could be marginally slower, but I would not make that assumption until I'd looked at the assembly for your specific case since the optimizer can probably get rid of it in most cases. "Marginally" here probably means a single machine instruction, and if you're really concerned with that level of performance then you should probably be working in assembly anyway.

    Plus it seems like I can't really run memset on a vector's storage, which may cost me.

    Of course you can.

    One of the problems with C++/STL is that it's much more difficult to see when you are implicitly asking more work to be done.

    I've heard that remark for the last 10 years and it only rings true if you don't program in C++. I think it's fair to say that if you don't understand the finer details of a particular language, you won't be able to write optimal code in that language. I'm not denigrating yourself or anyone else for not being able to write optimal code in a language which they don't use heavily, but to say that it's hard for an experienced user of the language is misleading at best.

    On the other hand, if you're involving vectors in an operation that must be run billions of times, any overhead at all can be a showstopper for STL use.

    Any per-operation overhead which is comparable to the cost of the operation could certainly be considered a show-stopper. If your operations are large, or if the

  3. Re:It all depends on Project Aims For 5x Increase In Python Performance · · Score: 1

    While I won't comment on the performance of any particular implementation of the STL, or any particular C++ optimizing compiler, many of your points don't ring true:

    That was my impression, too, but careful timing and profiling suggested otherwise.

    In addition, we can by simple reasoning determine that there's gotta be some overhead involved with vector implementations. First, vectors know their size; in particular, they know it in constant time. This means that they essentially must include a size field and update it whenever size changes.

    This is just as true of a C array; if you're resizing it, then you'll need to track its size manually. If you're never resizing it, then it's unfair to compare against a vector that you're resizing - just resize the vector (once) to the size you want, and you'll never again pay a penalty for updating the size member.

    Also, I can have a pointer to a vector, and that vector can grow arbitrarily without invalidating the pointer. That means that there pretty much has to be an indirect pointer to the vector's storage.

    Correct, it's more similar to a pointer (which in C is syntactically similar to an array) rather than an actual array.

    It also means that the vector's storage must more or less be coming from a heap, which definitely slows things down.

    I'm uncertain how you think that this might be the case. Being on the heap is not a performance penalty in any real sense that I can think of.

    Suppose I have a function I'm going to call a million times and it needs a temporary array of ints, of a size I can bound (maybe even small enough to be cache-beneficial). I can allocate that array in the parent function and pass in a pointer each time. Overhead to create and destroy the array in the inner function each time: zero. If you do this with a vector, the implementation has to zero the length, which costs time. Or you can delete and recreate it by letting it go out of scope, but that also costs time.

    I think you're confusing the data type for the algorithm. If you're not going to zero your C-style array, then one would have to assume that you don't need to zero the vector either. In which case, you're not going to see a real performance difference between the two approaches.

    I can certainly agree that it's possible to write inefficient code with the STL, however that comes down to what you're asking the STL to do. Certainly if you ask it to do more work than you were asking of your C-style array solution, the STL will come out slower - if you construct more often, clear more often, copy more often, introduces more counters, etc. - yes, it has the potential to be slower. However, on the exact same workload, you're going to find that the overheads in STL are exceedingly small.

    Oh, and one other thing - don't profile in debug mode. Many STL implementations include extensive debugging support which bumps up the cost of even simple operations in a debug build. Some even do this in release mode unless you explicitly disable it.

  4. Re:Adapt on Windows and Linux Not Well Prepared For Multicore Chips · · Score: 1

    This is true, but perhaps an overly simplistic view of the problem. If you can lower the overhead, then certain problems which were too small to be worth threading could suddenly benefit from threading. Almost any code could be threaded, if the overhead is low enough. This is what out-of-order execution is all about, really, but allowing explicit threading of inner loops would be a nice trick if the threading overhead could be made sufficiently low.

  5. Re:Adapt on Windows and Linux Not Well Prepared For Multicore Chips · · Score: 1

    It sounds like you're asking for threading (say, OpenMP style) which is supported at the hardware level rather than the OS level. The advantage of such a system would be that you could bring up additional threads with marginal start-up/shut-down/message-passing costs, allowing them to be used to accelerate small jobs for which the current OS-hosted threading models are too heavy. To accomplish this, the hardware would need to be able to bring up new threads at the request of the application programmer, with no per-thread OS interaction, and with little to no latency (the cpu effectively forks the current thread without missing a beat.)