Why? if you use C arrays with the Harpertown, you can use std::vectors. They are interchangeable. Having access to the first element of the array, you can take a pointer to it and use it as a C array.
You're saying that I can do this (at no cost)?
std::vector a =...;
int *p = &a[0];
I doubt this is legal, and even if it is I'd say the cure is worse than the disease...
If you see the vux984 acting as if s/he's in terrible pain, you'll act because:
S/He's a human being, and we know that humans can feel pain
Of course (or at least, I hope so). The question is not why I would help him/her if they seemed to be in terrible pain, but rather why I wouldn't feel at least a little like that if I saw a crab in terrible pain.
I think there's a difference between eating animals and torturing them before you do it. Personally the latter concerns me a lot more than the former.
The idea that animals "don't feel pain" sounds about as convincing as the idea that "slaves don't have souls/can't read/don't love their children/whatever". Totally self-serving.
First, vectors know their size; in particular, they know it in constant time. This means that they essentially must include a size field and update it whenever size changes.
This is just as true of a C array; if you're resizing it, then you'll need to track its size manually. If you're never resizing it, then it's unfair to compare against a vector that you're resizing - just resize the vector (once) to the size you want, and you'll never again pay a penalty for updating the size member.
Hmm--yes, that sounds right. Still, though, if I have O(n) arrays all the same size, C arrays can remember that in O(1) space, whereas STL would apparently need O(n) space (storing the size within each vector).
It also means that the vector's storage must more or less be coming from a heap, which definitely slows things down.
Being on the heap is not a performance penalty in any real sense that I can think of.
I was thinking of the time it takes to malloc and free the storage, which would (I would think) be much larger than the time involved in allocating on the stack, or perhaps out of static storage, in cases where this is possible.
Suppose I have a function I'm going to call a million times and it needs a temporary array of ints, of a size I can bound...
I think you're confusing the data type for the algorithm. If you're not going to zero your C-style array, then one would have to assume that you don't need to zero the vector either. In which case, you're not going to see a real performance difference between the two approaches.
Yes, okay. You're saying that if you forgo all of the vector-ish features of a vector, just allocate it initially and then index into it, there should be no additional relative cost. That does sound true. Although maybe there's still a bit for translating between "pointer to vector" and "pointer to vector's actual array". Plus it seems like I can't really run memset on a vector's storage, which may cost me.
Certainly if you ask it to do more work than you were asking of your C-style array solution, the STL will come out slower - if you construct more often, clear more often, copy more often, introduces more counters, etc. - yes, it has the potential to be slower. However, on the exact same workload, you're going to find that the overheads in STL are exceedingly small.
One of the problems with C++/STL is that it's much more difficult to see when you are implicitly asking more work to be done. It's certainly possible that I was shooting myself in the foot. On the other hand, if you're involving vectors in an operation that must be run billions of times, any overhead at all can be a showstopper for STL use.
If you'd asked me before I got into this project, I would have guessed that STL data structures would usually be more efficient than hand-rolled C. Now I'm much less sanguine about the whole thing.
Oh, and one other thing - don't profile in debug mode. Many STL implementations include extensive debugging support which bumps up the cost of even simple operations in a debug build. Some even do this in release mode unless you explicitly disable it.
Hmm. This would have been g++'s implementation of STL in the default mode (no flags or environment variables). I didn't see any "make it faster" options mentioned in the docs.
Now your developers must be good at both Python and C++...
I don't think I've ever met anyone who was good at C++. Maybe having as little of it as possible to worry about is the best that can be hoped for.
At every point in time, the rewrite will seem like a much more task than fixing up the Python. You think things like: "just a little bit of optimization and this is going to be acceptable". It's not easy to commit to the rewrite, even if you know you need it.
I smell bullshit. There is no overhead from using STL containers.
If you used an std::vector, you couldn't have a bottleneck, for the simple reason that the std::vector is an array.
That was my impression, too, but careful timing and profiling suggested otherwise.
In addition, we can by simple reasoning determine that there's gotta be some overhead involved with vector implementations. First, vectors know their size; in particular, they know it in constant time. This means that they essentially must include a size field and update it whenever size changes. Also, I can have a pointer to a vector, and that vector can grow arbitrarily without invalidating the pointer. That means that there pretty much has to be an indirect pointer to the vector's storage. It also means that the vector's storage must more or less be coming from a heap, which definitely slows things down. ("more or less" because one can imagine certain optimizations that might be possible if you somehow knew an upper bound on the vector's lifetime size)
All of this stuff costs you in time and space.
Suppose I have a function I'm going to call a million times and it needs a temporary array of ints, of a size I can bound (maybe even small enough to be cache-beneficial). I can allocate that array in the parent function and pass in a pointer each time. Overhead to create and destroy the array in the inner function each time: zero. If you do this with a vector, the implementation has to zero the length, which costs time. Or you can delete and recreate it by letting it go out of scope, but that also costs time.
Most of the time these minor effects don't matter, but if it's in the innermost loop and is going to run billions of times, it can be quite noticeable.
It could conceivably be that gcc's implementation of STL is a little slow. Doesn't matter why, though, because that's my target, and that's where my program has to run.
It's been a while since I went through this exercise, so I don't have the exact scenario. But the code is GPL'ed and available here. If you can replace any of the arrays with an as-simple, as-fast use of vectors, I'd be happy to have it.
You're not CPU bound until you: add all the features, handle the special cases, add the error checking, scale up beyond trivial test data, etc.
Then what? Rewrite?
Yes. If you didn't know all of that was going to happen, you're prototyping. If you're prototyping, you should be doing it in a prototyping language.
Rewriting from Python to C++ is not particularly difficult. Completely overhauling the design of a project written entirely in C++ is really unpleasant and takes a long time. So much so that many early design decisions on large C++ projects simply cannot be undone.
Model in clay first, then in stone later if you have to.
I find Python is about 20x slower (and about 10x faster to implement) than C, with the number varying quite a bit depending on how CPU-bound the code is. Given the speed of modern processors, this is plenty fast for many tasks.
Beyond that, many Python programmers employ a strategy of writing just the CPU-intensive inner loops in C or C++. This gives you most of the speed of an all-compiled solution but with much of the easier programming (and shorter programs) of the all-Python approach.
My particular scientific application runs on 1500 cores, is about 75% Python/25% C++, is 4-5x smaller than similar all-C/C++ programs, and runs at about 95-99.99% of the speed of an all C++ solution.
(Somewhat ironically, some of the worst performance bottlenecks in this app had to do with the overhead of some of the STL containers, which I ended up having to replace with C-style arrays, etc. to get best performance.)
Not all apps will fall out this way, but you definitely can't assume that just because something's written in Python that it will be slow.
(Going beyond that, we all know that better algorithms usually trump all of this anyway. If writing in Python gives you the time and clarity to be able to use an O(n)-better algorithm, that may pay off in itself.)
Well, let's see. It has to be priced at at least $2,000,000, so the big boss can say "Whatever the price is, we get half off!" and save the company a million dollars. And also, uh, what were we talking about?
Why do I care if I visit a web site and "non-free" JavaScript runs in my browser?
It's possible that you don't care--read his article and see.
I'd never given this issue much thought myself, but I do know from following RMS's writings for two decades that he's often prescient and dead-on. If he thinks there's a problem, it's definitely worth a look.
A "Computer Science student" where? Joe's HyTech Typing Academy and Storm Door Company?
Seriously, a good school will give you exposure to the ideas in a variety of programming languages drawn from across the spectrum. Ideally you'd be have some sense of the best ideas in (and worst flaws of) a dozen or more languages drawn from
Lisp (or Scheme)
Haskell (or Ocaml)
Prolog
Smalltalk
Forth
Pascal (or Modula-2/etc)
Python (or Ruby, and Perl)
Eiffel
Ada
Assembly
a couple of cutting-edge languages I haven't even heard of
and yes, some C, C++, and Java. You may not ever use most of these languages, but it's quite likely you'll benefit from being able to see things from their perspective. University is a time to learn fundamental concepts--it's not a trade school.
Not sure what I would be rationalizing--I've already stipulated that I don't illegally download stuff. I guess I could be rationalizing why I don't send the MPAA roses every week.
Why? if you use C arrays with the Harpertown, you can use std::vectors. They are interchangeable. Having access to the first element of the array, you can take a pointer to it and use it as a C array.
You're saying that I can do this (at no cost)?
std::vector a = ...;
int *p = &a[0];
I doubt this is legal, and even if it is I'd say the cure is worse than the disease...
The question of whether they can feel pain is [...] an actual biological mystery.
Exactly. Given that, it seems appropriate to assume that they are suffering until there is good scientific evidence to suggest otherwise.
If you see the vux984 acting as if s/he's in terrible pain, you'll act because:
Of course (or at least, I hope so). The question is not why I would help him/her if they seemed to be in terrible pain, but rather why I wouldn't feel at least a little like that if I saw a crab in terrible pain.
I think there's a difference between eating animals and torturing them before you do it. Personally the latter concerns me a lot more than the former.
The idea that animals "don't feel pain" sounds about as convincing as the idea that "slaves don't have souls/can't read/don't love their children/whatever". Totally self-serving.
I am not going to do your work. You just simply have to replace your arrays with vectors. It's very simple.
You'll have to forgive me, but for the time being I'm afraid I'll have to continue to believe my own lying eyes.
(I'm sure a lot of us would appreciate a link to your resume, however.)
First, vectors know their size; in particular, they know it in constant time. This means that they essentially must include a size field and update it whenever size changes.
This is just as true of a C array; if you're resizing it, then you'll need to track its size manually. If you're never resizing it, then it's unfair to compare against a vector that you're resizing - just resize the vector (once) to the size you want, and you'll never again pay a penalty for updating the size member.
Hmm--yes, that sounds right. Still, though, if I have O(n) arrays all the same size, C arrays can remember that in O(1) space, whereas STL would apparently need O(n) space (storing the size within each vector).
It also means that the vector's storage must more or less be coming from a heap, which definitely slows things down.
Being on the heap is not a performance penalty in any real sense that I can think of.
I was thinking of the time it takes to malloc and free the storage, which would (I would think) be much larger than the time involved in allocating on the stack, or perhaps out of static storage, in cases where this is possible.
Suppose I have a function I'm going to call a million times and it needs a temporary array of ints, of a size I can bound...
I think you're confusing the data type for the algorithm. If you're not going to zero your C-style array, then one would have to assume that you don't need to zero the vector either. In which case, you're not going to see a real performance difference between the two approaches.
Yes, okay. You're saying that if you forgo all of the vector-ish features of a vector, just allocate it initially and then index into it, there should be no additional relative cost. That does sound true. Although maybe there's still a bit for translating between "pointer to vector" and "pointer to vector's actual array". Plus it seems like I can't really run memset on a vector's storage, which may cost me.
Certainly if you ask it to do more work than you were asking of your C-style array solution, the STL will come out slower - if you construct more often, clear more often, copy more often, introduces more counters, etc. - yes, it has the potential to be slower. However, on the exact same workload, you're going to find that the overheads in STL are exceedingly small.
One of the problems with C++/STL is that it's much more difficult to see when you are implicitly asking more work to be done. It's certainly possible that I was shooting myself in the foot. On the other hand, if you're involving vectors in an operation that must be run billions of times, any overhead at all can be a showstopper for STL use.
If you'd asked me before I got into this project, I would have guessed that STL data structures would usually be more efficient than hand-rolled C. Now I'm much less sanguine about the whole thing.
Oh, and one other thing - don't profile in debug mode. Many STL implementations include extensive debugging support which bumps up the cost of even simple operations in a debug build. Some even do this in release mode unless you explicitly disable it.
Hmm. This would have been g++'s implementation of STL in the default mode (no flags or environment variables). I didn't see any "make it faster" options mentioned in the docs.
Now your developers must be good at both Python and C++...
I don't think I've ever met anyone who was good at C++. Maybe having as little of it as possible to worry about is the best that can be hoped for.
At every point in time, the rewrite will seem like a much more task than fixing up the Python. You think things like: "just a little bit of optimization and this is going to be acceptable". It's not easy to commit to the rewrite, even if you know you need it.
Yes. This is a good thing, per Knuth.
Kramulous: Where's a good place to learn about this stuff?
I smell bullshit. There is no overhead from using STL containers.
If you used an std::vector, you couldn't have a bottleneck, for the simple reason that the std::vector is an array.
That was my impression, too, but careful timing and profiling suggested otherwise.
In addition, we can by simple reasoning determine that there's gotta be some overhead involved with vector implementations. First, vectors know their size; in particular, they know it in constant time. This means that they essentially must include a size field and update it whenever size changes. Also, I can have a pointer to a vector, and that vector can grow arbitrarily without invalidating the pointer. That means that there pretty much has to be an indirect pointer to the vector's storage. It also means that the vector's storage must more or less be coming from a heap, which definitely slows things down. ("more or less" because one can imagine certain optimizations that might be possible if you somehow knew an upper bound on the vector's lifetime size)
All of this stuff costs you in time and space.
Suppose I have a function I'm going to call a million times and it needs a temporary array of ints, of a size I can bound (maybe even small enough to be cache-beneficial). I can allocate that array in the parent function and pass in a pointer each time. Overhead to create and destroy the array in the inner function each time: zero. If you do this with a vector, the implementation has to zero the length, which costs time. Or you can delete and recreate it by letting it go out of scope, but that also costs time.
Most of the time these minor effects don't matter, but if it's in the innermost loop and is going to run billions of times, it can be quite noticeable.
It could conceivably be that gcc's implementation of STL is a little slow. Doesn't matter why, though, because that's my target, and that's where my program has to run.
It's been a while since I went through this exercise, so I don't have the exact scenario. But the code is GPL'ed and available here. If you can replace any of the arrays with an as-simple, as-fast use of vectors, I'd be happy to have it.
Acting 'as if its in terrible pain' is not the same thing as being in terrible pain.
I'll have to remember that if I ever come across you acting as if you're in terrible pain.
You're not CPU bound until you: add all the features, handle the special cases, add the error checking, scale up beyond trivial test data, etc.
Then what? Rewrite?
Yes. If you didn't know all of that was going to happen, you're prototyping. If you're prototyping, you should be doing it in a prototyping language.
Rewriting from Python to C++ is not particularly difficult. Completely overhauling the design of a project written entirely in C++ is really unpleasant and takes a long time. So much so that many early design decisions on large C++ projects simply cannot be undone.
Model in clay first, then in stone later if you have to.
I find Python is about 20x slower (and about 10x faster to implement) than C, with the number varying quite a bit depending on how CPU-bound the code is. Given the speed of modern processors, this is plenty fast for many tasks.
Beyond that, many Python programmers employ a strategy of writing just the CPU-intensive inner loops in C or C++. This gives you most of the speed of an all-compiled solution but with much of the easier programming (and shorter programs) of the all-Python approach.
My particular scientific application runs on 1500 cores, is about 75% Python/25% C++, is 4-5x smaller than similar all-C/C++ programs, and runs at about 95-99.99% of the speed of an all C++ solution.
(Somewhat ironically, some of the worst performance bottlenecks in this app had to do with the overhead of some of the STL containers, which I ended up having to replace with C-style arrays, etc. to get best performance.)
Not all apps will fall out this way, but you definitely can't assume that just because something's written in Python that it will be slow.
(Going beyond that, we all know that better algorithms usually trump all of this anyway. If writing in Python gives you the time and clarity to be able to use an O(n)-better algorithm, that may pay off in itself.)
...thinking of. (Not very much, though.)
Well, let's see. It has to be priced at at least $2,000,000, so the big boss can say "Whatever the price is, we get half off!" and save the company a million dollars. And also, uh, what were we talking about?
What, we didn't get screwed hard enough by Thomson the first time? Is it not clear at this point that they're not to be trusted?
An elected government responding to the wishes of the electorate? Inconceivable!!
It's allowed if the net result is to get more innocent people killed.
It's amusing to see how many man-hours of would-be pundit commentary are spent telling us how RMS is "irrelevant"...
Why do I care if I visit a web site and "non-free" JavaScript runs in my browser?
It's possible that you don't care--read his article and see.
I'd never given this issue much thought myself, but I do know from following RMS's writings for two decades that he's often prescient and dead-on. If he thinks there's a problem, it's definitely worth a look.
A "Computer Science student" where? Joe's HyTech Typing Academy and Storm Door Company?
Seriously, a good school will give you exposure to the ideas in a variety of programming languages drawn from across the spectrum. Ideally you'd be have some sense of the best ideas in (and worst flaws of) a dozen or more languages drawn from
and yes, some C, C++, and Java. You may not ever use most of these languages, but it's quite likely you'll benefit from being able to see things from their perspective. University is a time to learn fundamental concepts--it's not a trade school.
...in which case all bets are off.
My favorite quote remains:
:-)
(Just kidding. The other posters had it correct--be a manager. Absolutely anyone can get hired to manage programmers...)
Failures on the part of our politicians do not constitute a justification for evil on the part of AIG.
Sounds like someone needs a waterboard...
...under ext4.
Not sure what I would be rationalizing--I've already stipulated that I don't illegally download stuff. I guess I could be rationalizing why I don't send the MPAA roses every week.
Well, I meant 100ms or worse, and generally much worse. That's just a guess on my part, not a read of any specs.