The point of these machines isn't to be able to install and play games. It's to bring basic computing services to school children in the developing world. Mainly, this means internet access, the ability to view PDF's, etc, the ability to type reports, even do mathematics (Octave, Maxima). I'm sure the machine will come pre-configured to do all of these things.
A computer is a hell of a lot more useful than a few books. Especially given things like MIT's Open Courseware, which, while not as good as the suite of books one collects over the course of an engineering eduction, is a whole lot cheaper!
Linux is just as easy to use as OS X (I use both), especially GNOME (which is designed according to many of the principles of classic MacOS). Ease of configuration is irrelevant here --- these will be closed-box systems that come pre-configured. They won't be any different from cell phones that use Linux, in this regard.
The decision to stick with open source is not a matter of ideology. The whole point of this exercise is to come up with a computer that can be provided to developing nations without "strings attached". That's why they're working so hard on the hardware to get the price down to $100. They're not trying to start a charity to give away computers --- if they were, they could easily use second-hand computers, or donated machines. Using OS X means depending on the charity of Apple. What happens if Apple decides to withdraw support for the program? What happens when new versions of the OS come out --- will Apple provide those for free? Using an OS that isn't tied to a corporation is the only way to deliver these machines the way they want to deliver them.
You can't crank a G4 up to 2GHz, because of the short pipeline. You could do like IBM, double the pipeline length (to 16 stages), and crank it up to 2GHz, but then you have a G5. The G5 is a decent chip, but out of the trio of Pentium-M, Opteron, G5, it's the slowest. It also is severely hampered by GCC. IBM's compiler can make a 2.5 GHz G5 perform comparably to a 2.2GHz Opteron in integer, and a 2.8 GHz Opteron in FP. Using GCC, it's more like a 1.8 GHz Opteron in integer and a 2.0 GHz Opteron in FP.
Actually, it does. See, single-payer health insurance can't work in an opt-out system.
Who said anything about opt-out? We're talking about banning private healthcare. There is no need for a single-payer system to ban private healthcare, the only requirement is that everyone pay the taxes that go into the system. It would be like the public school system --- even if you send your kids to a private school, you still have to pay the property taxes that go towards funding public eduction.
I didn't haul out the word "prominent," and in fact that's my whole point. When some idiot Slashdotter says that conservatives want to make abortion illegal, he gets modded up to +5.
A Pew poll in 2005 asked people: "Which comes closer to your view? Abortion should be generally available to those who want it. Abortion should be available but under stricter limits than it is now. Abortion should be against the law except in cases of rape, incest and to save the woman's life. Abortion should not be permitted at all."
31% of people (presumably conservative) answered that abortion should be against the law except in cases of rape, incest, and to save the woman's life. An additional 9% of people said that abortion should be illegal in all cases. There is no spinning this statistic --- conservatives want to outlaw general abortion, and a significant portion want to outlaw it entirely. Or do you think there is another way to interpret "should be against the law" other than "is illegal"?
When some other idiot Slashdotter says that liberals want to abolish private property, he gets modded down to -1.
I don't think such a poll has been conducted, but let's use our heads. If Pew phrased a poll asking: "Which comes closer to your view? Private property should be allowed in all cases. Private property should be allowed to anyone without a criminal record. Private property should be allowed in cases where the person can demonstrate a legitimate need. Private property should be against the law in all cases." How many liberals do you honestly think would answer "yes" to the latter two?
They are wacko fringers and not representative of the mainstream.
The statistics suggest otherwise.
Wiggle, wiggle, wiggle.
There is no wiggling involved here. To be on the left in the United States generally means that one is a liberal. Not a a socialist. To be on the right means that one is a conservative. Not a Nazi. Yes, socialists are on the left, and Nazis are on the right. However, nobody actually uses "left" and "right" in that way in everyday conversation.
"Don't confuse me with facts. My mind is made up."
What facts? Did you read the page I linked to? I quote:
"How would the Human Life Amendment affect legal abortion?
If adopted, it would permit states to enact and enforce laws to prohibit abortion. It would also permit Congress to adopt laws that protect the preborn."
How else can you interpret this statement other than that these people want laws promulgated to ban abortion? I'm not exactly taking a logical leap here!
We decided that it's unfair to characterize whole groups by the fringe views of a few radicals way outside the mainstream, didn't we?
You know what? The people who oppose Roe v. Wade on legalistic grounds seem to be the fringe radicals way outside the mainstream. The mainstream, judging by the polls, just seems to want teenage mothers from killing "unborn" children.
You keep saying that. It's still not true.
Again, the statistics suggest that you're wrong.
Now, we DO know that the majority of Americans believe that abortion is wrong. We know that from polling data. (It consistently comes out to about a 66/33 margin.) But we also know that the majority of Americans believe that abortion should be legal in some cases and legal in others. So this "they want to criminalize it!" assertion of yours is quite simply wrong.
What happens when you have a 32*32 element array? That's the problem with optimizing to a given data size, you're rarely in the situation (at least in modern programs), where you can count on your data being a given size. It's a far better idea to make your program cache friendly. Depending on your algorithm, perhaps it already is. If you're just reading data linearly, then you probably won't see a significant performance loss going from a 512 element cache to a 256 element cache, even with 1024 elements of data. If you do random access, it'd be worth it to get your program to operate on the data in tiles (say, in 32-element pieces), so your working set will still usually fit in cache.
The thing is, that's not true. In isolation, it might be, but the overhead of passing a data structure that is a few words of memory by value becomes significant if your function is called from inside a loop, and potentially horrible if you do it routinely throughout your program.
If its in a critical area, then the profiler will point it out. If its not in a critical area, then it doesn't matter. Plus, do you have any idea what the overhead really is? It's tiny. I just tried a benchmark calling a very simple function (consisting of two floating-point additions) in a loop. One of the parameters of the additions was passed via a struct that took up 32-bytes (8 words). The difference between passing it by value and passing it by reference was less than 10%. On a more complicated function, it would be noise.
The kicker is, profiling won't help you here. Lazily using pass-by-value where pass-by-const-reference would suffice will not cause a huge hit in one function, it will cause an x% hit in all the affected functions, and a profiler won't help you with that.
The profiler will tell you what functions use 80% of your runtime. The programmer is smart enough to check those functions and see if the parameter passing convention could possibly be an issue. The other functions don't matter. Even if its a constant 20% hit in all the other functions, we're talking about a 4% overall performance decrease, which is insignificant.
But the sort of issue I mentioned above isn't a fatal mistake in one place. It's more like death by a thousand cuts. And again, no post-processing with a profiler and hand-tuning will fix a system that is inherently slow because of such lazy coding practices.
Profiling fixes the code in which performance matters. The rest of the code can be slow, because it doesn't matter!
the code from something specialised like Intel's compiler is usually far better at low-level optimisations.
Numerous benchmarks show that Intel's compiler is rarely more than 20% faster than GCC, unless it can take advantage of auto-vectorization. The assembly output doesn't mean much. As far as the CPU is concerned, its just bytecode --- it gets translated and reschedule before getting executed anyway.
You realize that up until recently, it was illegal to sell private health insurance in Canada? (The Canadian high court overturned the law making it illegal just a few months ago.)
Are we talking about Canada or the United States?
You realize that the Clinton health-care plan, which thankfully never got past the talking stages, was going to be based on a single-payer plan system modeled after the Canadian system?
The Canadian plan also results in lower healthcare costs and better healthcare, so modeling ours on theirs is probably a good idea. That doesn't mean that we have to outlaw private health insurance along with it! You understand what "modeling" means, do you not?
You realize that one of the most prominent left-leaning advocacy groups is called "Socialist Alternative" and that it calls for the seizure of the 500 biggest American companies and the replacement of their owners and management with committees of citizens?
Prominent by whose definition? You do understand that when one says "left" in the United States, we're talking about leftist relative to the American mainstream, not what would be considered leftist internationally. In the US, the proper term for groups like "socialist alternative" would be "socialist". We aren't talking about socialists here.
You realize that this prominent group calls for making the taking of "excessive profit" a crime, and that their leaders have been running editorials in the nation's opinion pages for years now?
Again, prominent by whose definition?
Name one. Seriously. Name one. I've given you concrete examples. Name one.
The Human Life Amendment. And don't give me any crap about it being intended to merely allow states to criminalize abortion, not force them to. Criminalization of abortion is precisely what they want. See also Operation Rescue/Operation Save America.
Do you know the difference between overturning Roe and criminalizing abortion?
There is a reasonable argument that Roe v. Wade is a bad precedent not on the basis of the abortion debate, but on the merits on which it was argued. With some caveats, I actually buy this argument. That doesn't change the fact that the vast majority of people working towards overturning. Roe v. Wade are not working in favor of reason in law, but because they want to see the criminalization of abortion. There is a war in this country between those who support abortion and those who want to see it become illegal. Roe v. Wade is an icon in that war. Those who oppose Roe v. Wade on principle, but not abortion per se, are simply find themselves on the wrong side of a debate that only has two sides.
You realize that there is no prominent movement on the left to do any of the things you stated. There are very prominent movements on the right to do the things the parent stated. What legitimate leftist organization wants to criminalize going to church? How many legitimate rightist organizations want to criminalize abortion?
Consider, for example, passing a large bit of data as a parameter to a function. In languages that use pass-by-reference semantics, this will typically be cheap. In languages that use pass-by-value semantics, this will typically be expensive. In C++, you have a choice, but the natural (that is, default) is by value. Would you tell a C++ programmer not to use const-reference parameter types from the start, because it's a premature optimization?
I would tell a C++ programmer that worrying about a bit of extra data copy in the function call is generally useless. It's really not that expensive unless your structs are monstrously large. Generally the question you're interested in is semantics. Do the semantics lend themselves to a pass-by-value, or a pass-by-reference? If, after profiling, you find that this is a problem, use the passing style on those few functions that the profiler points out. Doing it for everything else is useless.
In some types of software, you simply have to plan for performance from the start.
Yes, but planning for performance from the start doesn't mean optimizing from the start. It means designing good algorithms and implementing them without any grossly stupid performance mistakes. Optimization can happen after implementation, where profiling shows the need for more hand-tuning.
Obviously algorithmic improvements make more difference than anything else, but even so, there's a scale between large-scale algorithm and data structure changes and assembly-level micro-optimisation, not a switch.
It's a scale, but one very biased towards high-level optimizations. Compilers do an excellent job of the low-level stuff. Even at the data structure level, you get a lot more benefit from considering things like ordering your access patterns for cache-friendliness than you do from saving a byte or two here or there.
I don't see any reason why the CPU can't see the register as both 1 32-bit register and 2 16-bit registers. After all, MMX reused the floating point registers.
There are a number of problems with partial registers. At the CPU level, it comes when trying to figure out instruction dependencies. Supporting half registers makes things a lot more complicated when you see that instruction 1 writes to EAX, while instruction 2 reads from AX. Second, it makes register allocation a lot more complicated.
The problem with writing portable code as things now stand is that it is oblivious to fitting things into cache, as it must remain cache-size independent.
Being cache-size independent doesn't mean being oblivious about fitting things into cache. The Right Thing (TM) to do is to make your memory access patterns predictable for the cache. That means that no matter how big your data set gets, or how small a cache you run on, you won't suffer catastrophic performances decreases.
Since current tools are built with that sort of attitude about portable code, the designers refuse to implement features to allow the coder to code to cache sizes.
You don't want to code to the cache size. That's not what caches are for. All you'll end up doing is screwing your performance when your data gets twice as large, or you want to run on a CPU with less cache. Again, what you want to do is design your algorithms to perform cache-friendly memory accesses. Treat the cache as a cache, not a local memory.
Annotations are not ways to improve your algorithm. I'm talking about improving your algorithm at the theoretical level, to say run with O(log(N)) complexity instead of O(N), or scale as O(N) with number of processors instead of O(sqrt(N)). The annotations you suggested are nothing more than mucking with the compiler's business.
First, using 16-bit components of registers incurs a stall on most modern x86 CPUs. Remember, they are RISC processors underneath, which have no conception of partial GPRs. Second, RAM is dirt cheap, so let's not even consider blowing RAM. Things get interesting when talking about fitting things in cache, but the simple truth is that if your data doesn't fit in cache, the benefit from just halving its size is usually minimal. As soon as your data set grows, you've blown the cache again. You're almost always better-served trying to figure out how to get your code to operate on data in cache-sized chunks, so your performance stays constantly good, instead of being great with one data set and piss-poor with another.
Here's a better one: The instruction is smaller so you can fit more instructions in RAM which means less flushes to disk.
If your instructions don't find in RAM completely, then you're screwed. Buy more RAM.
Attacking problems from a "every byte counts" perspective can help you decide what you want to do when every byte doesn't count.
I don't see how.
Besides, all things being equal, why not go for the smaller code size?
Because, all things are generally not equal. Worrying about this stuff makes sense if you're code is already feature-complete, bug-free, and uses the absolute state-of-the-art in algorithms, but who has such code that they can worry about these things?
Generally, the time you spent adding useless annotations to your source code would be better-spent with a pencil and paper trying to figure out a way to improve your algorithm. Compilers, generally, are good enough these days. Especially now that GCC is decent and runs on most of the interesting processors. The gains in performance, and this is is something that even the Linux kernel guys have realized, are going to come from good algorithms. This is especially true because of the recent multi-core phenomenon. More and more, "good code" is going to be code that implements good scalable algorithms. Lower complexity beats smaller constant factors any day of the week.
It's also interesting to point out that more recent processors are designed to not have any particular features that could be supported. Even x86 processors generally only fully-support a RISC-y subset of the ISA, and microcode the weird, complex instructions.
1) Premature optimization is evil. Everybody says this, but so many people do not take it to heart. I'd rather have software that works, than software that is fast but crashes. As a programmer, its nice to work on non-buggy software, even if its not as fast as it could be.
2) Target-specific optimization is generally evil, unless you're sure your code will not live very long (eg: a game). The thing is that micro-optimizations generally tune for a particular processor, and actually pessimizes the code in the long run. In comparison, if you write good general code, it'll still be fast ten years from now when processors look very different.
3) The bottlenecks that people, especially C/C++ programmers worry about, are usually not the bottlenecks that usually matter. If you worry that your code could be faster/more memory efficient if you use a 16-bit field here or there instead of a 32-bit one, your algorithms better be absolutely perfect. Most code does not use perfect algorithms. That's why so much software is still so slow. Most programmers just don't get the time to use the best algorithms, much less get down to the level of micro-optimizations.
That's why I always find language performance debates entertaining. C/C++ programmers will freak out if you tell them language X is very productive, but is maybe two-thirds as fast as C (something that is true of a number of high-level, but compiled, languages). Meanwhile, they will write code that runs at maybe 1/3 of what the machine is capable of, because they spend so much time writing the code they have little time to optimize it.
On par with what? I'm a Lisp guy/compiler enthusiast. I like processors with out-of-order execution that don't care about code scheduling, have excellent branch prediction, have low memory latency, etc. Basically, my ideal processor is an Opteron. It's all about perspective, hence my criticism of your use of the word "elegant".
My Athlon X2 has 2 64-bit internally-RISC processors, 3 quiet fans that don't need software control because they only spin at 1000 RPM to begin with, and a case that looks like a mini-fridge but has excellent acoustic propertites. In comparison, the G5 PowerMac next to it has a slightly-slower dual core processor (even though its clocked 100MHz higher than the X2), 7 fans with annoying bearing noise, high-latency memory bus (measured at twice the X2's latency), and an all-aluminum case with lousy acoustic properties (exhibits the characteristic tendency of aluminum to turn vibrations into ringing). The Ferrari-level aerodynamics can't keep the processor below 60C loaded (the X2 runs at 50C loaded), and the whole thing is maybe twice as loud (and at an annoyingly higher-frequency) than the X2.
Singularity isn't really a microkernel. It's a "no kernel", in that there is no seperation between kernel space and user space. The protection offered by such a design is much more fine-grained than with a microkernel (individual threads are protected from each other), and doesn't require the same performance hit (no expensive context switches between protection domains).
Obviously you're not a security guy or a compiler guy. The estimates show that 50% of security bugs are the result of some sort of memory-related hole. Also, array-range checking, with a compiler (native-code, not JIT) that does range-propagation and bounds check elimination, is about 3-5% on a modern superscaler processor.
The fact that the virtual machine is a bit slower isn't the point. The point is that because the virtual machine ensures memory protection, Singularity doesn't need to use hardware memory protection for the kernel. Doing a single system call costs hundreds of clock cycles on a modern CPU, because of the userspace/kernelspace switch. It also necessitates all sorts of complex (and slow) IPC mechanisms that go through the kernel (and invoke the aforementioned switch), all because we're still programming in an antiquated 1970's era language that let's programs randomly write into memory.
Modern CPUs quite be quite a bit faster if they didn't have to support C. Take a look sometime at all the die space an Athlon64 uses for stuff like TLB, etc. Also look how it needs to increase L1 cache latency by 50% (from 2 cycles to 3), just to support the TLB lookup. All of this stuff would be unnecessary if C programs couldn't overwrite whatever memory they wanted.
The point of these machines isn't to be able to install and play games. It's to bring basic computing services to school children in the developing world. Mainly, this means internet access, the ability to view PDF's, etc, the ability to type reports, even do mathematics (Octave, Maxima). I'm sure the machine will come pre-configured to do all of these things.
A computer is a hell of a lot more useful than a few books. Especially given things like MIT's Open Courseware, which, while not as good as the suite of books one collects over the course of an engineering eduction, is a whole lot cheaper!
Linux is just as easy to use as OS X (I use both), especially GNOME (which is designed according to many of the principles of classic MacOS). Ease of configuration is irrelevant here --- these will be closed-box systems that come pre-configured. They won't be any different from cell phones that use Linux, in this regard.
The decision to stick with open source is not a matter of ideology. The whole point of this exercise is to come up with a computer that can be provided to developing nations without "strings attached". That's why they're working so hard on the hardware to get the price down to $100. They're not trying to start a charity to give away computers --- if they were, they could easily use second-hand computers, or donated machines. Using OS X means depending on the charity of Apple. What happens if Apple decides to withdraw support for the program? What happens when new versions of the OS come out --- will Apple provide those for free? Using an OS that isn't tied to a corporation is the only way to deliver these machines the way they want to deliver them.
You can't crank a G4 up to 2GHz, because of the short pipeline. You could do like IBM, double the pipeline length (to 16 stages), and crank it up to 2GHz, but then you have a G5. The G5 is a decent chip, but out of the trio of Pentium-M, Opteron, G5, it's the slowest. It also is severely hampered by GCC. IBM's compiler can make a 2.5 GHz G5 perform comparably to a 2.2GHz Opteron in integer, and a 2.8 GHz Opteron in FP. Using GCC, it's more like a 1.8 GHz Opteron in integer and a 2.0 GHz Opteron in FP.
Actually, it does. See, single-payer health insurance can't work in an opt-out system.
Who said anything about opt-out? We're talking about banning private healthcare. There is no need for a single-payer system to ban private healthcare, the only requirement is that everyone pay the taxes that go into the system. It would be like the public school system --- even if you send your kids to a private school, you still have to pay the property taxes that go towards funding public eduction.
I didn't haul out the word "prominent," and in fact that's my whole point. When some idiot Slashdotter says that conservatives want to make abortion illegal, he gets modded up to +5.
A Pew poll in 2005 asked people: "Which comes closer to your view? Abortion should be generally available to those who want it. Abortion should be available but under stricter limits than it is now. Abortion should be against the law except in cases of rape, incest and to save the woman's life. Abortion should not be permitted at all."
31% of people (presumably conservative) answered that abortion should be against the law except in cases of rape, incest, and to save the woman's life. An additional 9% of people said that abortion should be illegal in all cases. There is no spinning this statistic --- conservatives want to outlaw general abortion, and a significant portion want to outlaw it entirely. Or do you think there is another way to interpret "should be against the law" other than "is illegal"?
When some other idiot Slashdotter says that liberals want to abolish private property, he gets modded down to -1.
I don't think such a poll has been conducted, but let's use our heads. If Pew phrased a poll asking: "Which comes closer to your view? Private property should be allowed in all cases. Private property should be allowed to anyone without a criminal record. Private property should be allowed in cases where the person can demonstrate a legitimate need. Private property should be against the law in all cases." How many liberals do you honestly think would answer "yes" to the latter two?
They are wacko fringers and not representative of the mainstream.
The statistics suggest otherwise.
Wiggle, wiggle, wiggle.
There is no wiggling involved here. To be on the left in the United States generally means that one is a liberal. Not a a socialist. To be on the right means that one is a conservative. Not a Nazi. Yes, socialists are on the left, and Nazis are on the right. However, nobody actually uses "left" and "right" in that way in everyday conversation.
"Don't confuse me with facts. My mind is made up."
What facts? Did you read the page I linked to? I quote:
"How would the Human Life Amendment affect legal abortion?
If adopted, it would permit states to enact and enforce laws to prohibit abortion. It would also permit Congress to adopt laws that protect the preborn."
How else can you interpret this statement other than that these people want laws promulgated to ban abortion? I'm not exactly taking a logical leap here!
We decided that it's unfair to characterize whole groups by the fringe views of a few radicals way outside the mainstream, didn't we?
You know what? The people who oppose Roe v. Wade on legalistic grounds seem to be the fringe radicals way outside the mainstream. The mainstream, judging by the polls, just seems to want teenage mothers from killing "unborn" children.
You keep saying that. It's still not true.
Again, the statistics suggest that you're wrong.
Now, we DO know that the majority of Americans believe that abortion is wrong. We know that from polling data. (It consistently comes out to about a 66/33 margin.) But we also know that the majority of Americans believe that abortion should be legal in some cases and legal in others. So this "they want to criminalize it!" assertion of yours is quite simply wrong.
First, the
What happens when you have a 32*32 element array? That's the problem with optimizing to a given data size, you're rarely in the situation (at least in modern programs), where you can count on your data being a given size. It's a far better idea to make your program cache friendly. Depending on your algorithm, perhaps it already is. If you're just reading data linearly, then you probably won't see a significant performance loss going from a 512 element cache to a 256 element cache, even with 1024 elements of data. If you do random access, it'd be worth it to get your program to operate on the data in tiles (say, in 32-element pieces), so your working set will still usually fit in cache.
The thing is, that's not true. In isolation, it might be, but the overhead of passing a data structure that is a few words of memory by value becomes significant if your function is called from inside a loop, and potentially horrible if you do it routinely throughout your program.
If its in a critical area, then the profiler will point it out. If its not in a critical area, then it doesn't matter. Plus, do you have any idea what the overhead really is? It's tiny. I just tried a benchmark calling a very simple function (consisting of two floating-point additions) in a loop. One of the parameters of the additions was passed via a struct that took up 32-bytes (8 words). The difference between passing it by value and passing it by reference was less than 10%. On a more complicated function, it would be noise.
The kicker is, profiling won't help you here. Lazily using pass-by-value where pass-by-const-reference would suffice will not cause a huge hit in one function, it will cause an x% hit in all the affected functions, and a profiler won't help you with that.
The profiler will tell you what functions use 80% of your runtime. The programmer is smart enough to check those functions and see if the parameter passing convention could possibly be an issue. The other functions don't matter. Even if its a constant 20% hit in all the other functions, we're talking about a 4% overall performance decrease, which is insignificant.
But the sort of issue I mentioned above isn't a fatal mistake in one place. It's more like death by a thousand cuts. And again, no post-processing with a profiler and hand-tuning will fix a system that is inherently slow because of such lazy coding practices.
Profiling fixes the code in which performance matters. The rest of the code can be slow, because it doesn't matter!
the code from something specialised like Intel's compiler is usually far better at low-level optimisations.
Numerous benchmarks show that Intel's compiler is rarely more than 20% faster than GCC, unless it can take advantage of auto-vectorization. The assembly output doesn't mean much. As far as the CPU is concerned, its just bytecode --- it gets translated and reschedule before getting executed anyway.
You realize that up until recently, it was illegal to sell private health insurance in Canada? (The Canadian high court overturned the law making it illegal just a few months ago.)
Are we talking about Canada or the United States?
You realize that the Clinton health-care plan, which thankfully never got past the talking stages, was going to be based on a single-payer plan system modeled after the Canadian system?
The Canadian plan also results in lower healthcare costs and better healthcare, so modeling ours on theirs is probably a good idea. That doesn't mean that we have to outlaw private health insurance along with it! You understand what "modeling" means, do you not?
You realize that one of the most prominent left-leaning advocacy groups is called "Socialist Alternative" and that it calls for the seizure of the 500 biggest American companies and the replacement of their owners and management with committees of citizens?
Prominent by whose definition? You do understand that when one says "left" in the United States, we're talking about leftist relative to the American mainstream, not what would be considered leftist internationally. In the US, the proper term for groups like "socialist alternative" would be "socialist". We aren't talking about socialists here.
You realize that this prominent group calls for making the taking of "excessive profit" a crime, and that their leaders have been running editorials in the nation's opinion pages for years now?
Again, prominent by whose definition?
Name one. Seriously. Name one. I've given you concrete examples. Name one.
The Human Life Amendment. And don't give me any crap about it being intended to merely allow states to criminalize abortion, not force them to. Criminalization of abortion is precisely what they want. See also Operation Rescue/Operation Save America.
Do you know the difference between overturning Roe and criminalizing abortion?
There is a reasonable argument that Roe v. Wade is a bad precedent not on the basis of the abortion debate, but on the merits on which it was argued. With some caveats, I actually buy this argument. That doesn't change the fact that the vast majority of people working towards overturning. Roe v. Wade are not working in favor of reason in law, but because they want to see the criminalization of abortion. There is a war in this country between those who support abortion and those who want to see it become illegal. Roe v. Wade is an icon in that war. Those who oppose Roe v. Wade on principle, but not abortion per se, are simply find themselves on the wrong side of a debate that only has two sides.
You realize that there is no prominent movement on the left to do any of the things you stated. There are very prominent movements on the right to do the things the parent stated. What legitimate leftist organization wants to criminalize going to church? How many legitimate rightist organizations want to criminalize abortion?
Consider, for example, passing a large bit of data as a parameter to a function. In languages that use pass-by-reference semantics, this will typically be cheap. In languages that use pass-by-value semantics, this will typically be expensive. In C++, you have a choice, but the natural (that is, default) is by value. Would you tell a C++ programmer not to use const-reference parameter types from the start, because it's a premature optimization?
I would tell a C++ programmer that worrying about a bit of extra data copy in the function call is generally useless. It's really not that expensive unless your structs are monstrously large. Generally the question you're interested in is semantics. Do the semantics lend themselves to a pass-by-value, or a pass-by-reference? If, after profiling, you find that this is a problem, use the passing style on those few functions that the profiler points out. Doing it for everything else is useless.
In some types of software, you simply have to plan for performance from the start.
Yes, but planning for performance from the start doesn't mean optimizing from the start. It means designing good algorithms and implementing them without any grossly stupid performance mistakes. Optimization can happen after implementation, where profiling shows the need for more hand-tuning.
Obviously algorithmic improvements make more difference than anything else, but even so, there's a scale between large-scale algorithm and data structure changes and assembly-level micro-optimisation, not a switch.
It's a scale, but one very biased towards high-level optimizations. Compilers do an excellent job of the low-level stuff. Even at the data structure level, you get a lot more benefit from considering things like ordering your access patterns for cache-friendliness than you do from saving a byte or two here or there.
I don't see any reason why the CPU can't see the register as both 1 32-bit register and 2 16-bit registers. After all, MMX reused the floating point registers.
There are a number of problems with partial registers. At the CPU level, it comes when trying to figure out instruction dependencies. Supporting half registers makes things a lot more complicated when you see that instruction 1 writes to EAX, while instruction 2 reads from AX. Second, it makes register allocation a lot more complicated.
The problem with writing portable code as things now stand is that it is oblivious to fitting things into cache, as it must remain cache-size independent.
Being cache-size independent doesn't mean being oblivious about fitting things into cache. The Right Thing (TM) to do is to make your memory access patterns predictable for the cache. That means that no matter how big your data set gets, or how small a cache you run on, you won't suffer catastrophic performances decreases.
Since current tools are built with that sort of attitude about portable code, the designers refuse to implement features to allow the coder to code to cache sizes.
You don't want to code to the cache size. That's not what caches are for. All you'll end up doing is screwing your performance when your data gets twice as large, or you want to run on a CPU with less cache. Again, what you want to do is design your algorithms to perform cache-friendly memory accesses. Treat the cache as a cache, not a local memory.
Annotations are not ways to improve your algorithm. I'm talking about improving your algorithm at the theoretical level, to say run with O(log(N)) complexity instead of O(N), or scale as O(N) with number of processors instead of O(sqrt(N)). The annotations you suggested are nothing more than mucking with the compiler's business.
First, using 16-bit components of registers incurs a stall on most modern x86 CPUs. Remember, they are RISC processors underneath, which have no conception of partial GPRs. Second, RAM is dirt cheap, so let's not even consider blowing RAM. Things get interesting when talking about fitting things in cache, but the simple truth is that if your data doesn't fit in cache, the benefit from just halving its size is usually minimal. As soon as your data set grows, you've blown the cache again. You're almost always better-served trying to figure out how to get your code to operate on data in cache-sized chunks, so your performance stays constantly good, instead of being great with one data set and piss-poor with another.
Here's a better one: The instruction is smaller so you can fit more instructions in RAM which means less flushes to disk.
If your instructions don't find in RAM completely, then you're screwed. Buy more RAM.
Attacking problems from a "every byte counts" perspective can help you decide what you want to do when every byte doesn't count.
I don't see how.
Besides, all things being equal, why not go for the smaller code size?
Because, all things are generally not equal. Worrying about this stuff makes sense if you're code is already feature-complete, bug-free, and uses the absolute state-of-the-art in algorithms, but who has such code that they can worry about these things?
Generally, the time you spent adding useless annotations to your source code would be better-spent with a pencil and paper trying to figure out a way to improve your algorithm. Compilers, generally, are good enough these days. Especially now that GCC is decent and runs on most of the interesting processors. The gains in performance, and this is is something that even the Linux kernel guys have realized, are going to come from good algorithms. This is especially true because of the recent multi-core phenomenon. More and more, "good code" is going to be code that implements good scalable algorithms. Lower complexity beats smaller constant factors any day of the week.
It's also interesting to point out that more recent processors are designed to not have any particular features that could be supported. Even x86 processors generally only fully-support a RISC-y subset of the ISA, and microcode the weird, complex instructions.
A couple of points about optimization.
1) Premature optimization is evil. Everybody says this, but so many people do not take it to heart. I'd rather have software that works, than software that is fast but crashes. As a programmer, its nice to work on non-buggy software, even if its not as fast as it could be.
2) Target-specific optimization is generally evil, unless you're sure your code will not live very long (eg: a game). The thing is that micro-optimizations generally tune for a particular processor, and actually pessimizes the code in the long run. In comparison, if you write good general code, it'll still be fast ten years from now when processors look very different.
3) The bottlenecks that people, especially C/C++ programmers worry about, are usually not the bottlenecks that usually matter. If you worry that your code could be faster/more memory efficient if you use a 16-bit field here or there instead of a 32-bit one, your algorithms better be absolutely perfect. Most code does not use perfect algorithms. That's why so much software is still so slow. Most programmers just don't get the time to use the best algorithms, much less get down to the level of micro-optimizations.
That's why I always find language performance debates entertaining. C/C++ programmers will freak out if you tell them language X is very productive, but is maybe two-thirds as fast as C (something that is true of a number of high-level, but compiled, languages). Meanwhile, they will write code that runs at maybe 1/3 of what the machine is capable of, because they spend so much time writing the code they have little time to optimize it.
Intel processors can perform unaligned memory accesses. They just incur an enormous performance hit in doing so.
On par with what? I'm a Lisp guy/compiler enthusiast. I like processors with out-of-order execution that don't care about code scheduling, have excellent branch prediction, have low memory latency, etc. Basically, my ideal processor is an Opteron. It's all about perspective, hence my criticism of your use of the word "elegant".
My Athlon X2 has 2 64-bit internally-RISC processors, 3 quiet fans that don't need software control because they only spin at 1000 RPM to begin with, and a case that looks like a mini-fridge but has excellent acoustic propertites. In comparison, the G5 PowerMac next to it has a slightly-slower dual core processor (even though its clocked 100MHz higher than the X2), 7 fans with annoying bearing noise, high-latency memory bus (measured at twice the X2's latency), and an all-aluminum case with lousy acoustic properties (exhibits the characteristic tendency of aluminum to turn vibrations into ringing). The Ferrari-level aerodynamics can't keep the processor below 60C loaded (the X2 runs at 50C loaded), and the whole thing is maybe twice as loud (and at an annoyingly higher-frequency) than the X2.
Singularity isn't really a microkernel. It's a "no kernel", in that there is no seperation between kernel space and user space. The protection offered by such a design is much more fine-grained than with a microkernel (individual threads are protected from each other), and doesn't require the same performance hit (no expensive context switches between protection domains).
To be more accurate, that should read "except for Singularity, which doesn't need this protection".
Obviously you're not a security guy or a compiler guy. The estimates show that 50% of security bugs are the result of some sort of memory-related hole. Also, array-range checking, with a compiler (native-code, not JIT) that does range-propagation and bounds check elimination, is about 3-5% on a modern superscaler processor.
The fact that the virtual machine is a bit slower isn't the point. The point is that because the virtual machine ensures memory protection, Singularity doesn't need to use hardware memory protection for the kernel. Doing a single system call costs hundreds of clock cycles on a modern CPU, because of the userspace/kernelspace switch. It also necessitates all sorts of complex (and slow) IPC mechanisms that go through the kernel (and invoke the aforementioned switch), all because we're still programming in an antiquated 1970's era language that let's programs randomly write into memory.
Modern CPUs quite be quite a bit faster if they didn't have to support C. Take a look sometime at all the die space an Athlon64 uses for stuff like TLB, etc. Also look how it needs to increase L1 cache latency by 50% (from 2 cycles to 3), just to support the TLB lookup. All of this stuff would be unnecessary if C programs couldn't overwrite whatever memory they wanted.
IIRC, the SPEs have 18 stage pipelines. Okay for an FP pipe, but quite long for an INT pipe.