Domain: agner.org
Stories and comments across the archive that link to agner.org.
Comments · 55
-
Re:Intel Compiler
You need to do more reading friend. https://www.agner.org/optimize...
-
Re:Don't forget guys
This is the same Intel that put cripple AMD cpus code generation in their compiler.
Here is CPU optimization expert Agner Fog's blog on the subject: Intel's "cripple AMD" function -
Thios one is CPU specific
Sigh, did anyone actually read the spectre paper;
Exploiting Indirect Branches.
The bit about execution beyond software checks is explaining a specific detail about memory side effects. The above section builds on that concept to show that you can induce these memory side effects by tricking the branch predictor to execute existing code in an unexpected way.
Okay, to go into more details :
there are two things that are call spectre, which are both based around speculative execution.The first one, which gets around software check, to which every single deeply-pipelined/out-of-order CPU that does speculative execution (lots of vendors, some as long back as mid 90s), and which is basically still "speculative execution working as intended", is the one I've described in my post.
That the one to which every piece of software running on nearly any CPU (except perhaps older Intel Atoms, Intel Xeon Phis, and older ARM 32bits as those don't do speculative execution, because as a matter of fact they have way to short pipelines) is susceptible, but which in practice isn't very concerning because it basically targets software which has "please exploit me!" design flaws written all-over it.
The second things which is called "spectre", also uses speculative execution, but is an extremely specific stuff that only targets specific CPUs :
only specific Intel CPUs are concerned, only in extremely specific circumstances. AMD CPUs are not affected. And that's expected because each CPU uses an entirely different strategy to predict branches.Just like with Meltdown, it against boils down to Intel CPU trying to be way too much clever, trading security to shave a few performance points.
It boils down to an address (here a jump target) to even being known at the time when instruction start to pour into the pipeline. Some CPUs may try to guesstimate where the execution would go next.
The way some specific Intel CPU store their estimations means there's a risk of aliasing/confusion (CPU has learned that instruction A usually jumps to point B, but when the CPU sees instruction C it get things mixed up and think that there's a high chance it will also jump to point B and start speculatively executing there, even if that ends up not being the case and C actually jumps to a different point B).By knowing the specific make of affected Intel CPUs, and by knowing the exact way in which this aliasing and confusion happens in that specific Intel CPUs, and by allocating a shit ton of memory (so you end up with an address that can actually be confused/aliased with your target) and by the way knowing in advance the foreign address you're trageting (because, you know, ASLR gets in the way) and spending around half an hour doing stuff (according to the Google demo) in order to get the exact thing you need so that specific Intel CPU confuses the thing exactly the way you need, then you can have the CPU guess wrong and jumping speculatively to the completely random address you've asked it to jump to (until it notice it's wrong, throws nearly everything out - except the already prefetched cache - and jumps back to the actual correct execution).
This is not something that affects every speculatively executing CPU in existence, this is not a CPU still working as it should (unlike the other exploit).
This is some specific CPU (happens to be by Intel) that each take wild guesses - way too much wild guesses - and if you know exactly how this CPU takes its too wild guesses, you can abuse to make it guess wrong. No other CPU will be affect.
Given the complexity of the taks, this is not something that you're going to see a lot in the wild and automated (not in drive-by Javascript attacks). This is something that is going to be specially crafted manually, for some very specific attacks (an attacker want to break the specific hyper visor in which it's currently staying).
-
Re:are AMD and intel cpu interchangable
> I regretted that because I found that at that time in history while some code did run equally well on these that in general the software libraries for AMD just weren't tuned as well for these chips. Many optimizations not taken.
Part of that was do to Intel's shenanigans.
Intel's "cripple AMD" function in their compiler
Unfortunately, software compiled with the Intel compiler or the Intel function libraries has inferior performance on AMD and VIA processors. The reason is that the compiler or library can make multiple versions of a piece of code, each optimized for a certain processor and instruction set, for example SSE2, SSE3, etc. The system includes a function that detects which type of CPU it is running on and chooses the optimal code path for that CPU. This is called a CPU dispatcher. However, the Intel CPU dispatcher does not only check which instruction set is supported by the CPU, it also checks the vendor ID string. If the vendor string says "GenuineIntel" then it uses the optimal code path. If the CPU is not from Intel then, in most cases, it will run the slowest possible version of the code, even if the CPU is fully compatible with a better version.
Nobody's forcing you to use the Intel compiler though. Use the other well established standard compilers.
-
Re:are AMD and intel cpu interchangable
> I regretted that because I found that at that time in history while some code did run equally well on these that in general the software libraries for AMD just weren't tuned as well for these chips. Many optimizations not taken.
Part of that was do to Intel's shenanigans.
Intel's "cripple AMD" function in their compiler
Unfortunately, software compiled with the Intel compiler or the Intel function libraries has inferior performance on AMD and VIA processors. The reason is that the compiler or library can make multiple versions of a piece of code, each optimized for a certain processor and instruction set, for example SSE2, SSE3, etc. The system includes a function that detects which type of CPU it is running on and chooses the optimal code path for that CPU. This is called a CPU dispatcher. However, the Intel CPU dispatcher does not only check which instruction set is supported by the CPU, it also checks the vendor ID string. If the vendor string says "GenuineIntel" then it uses the optimal code path. If the CPU is not from Intel then, in most cases, it will run the slowest possible version of the code, even if the CPU is fully compatible with a better version.
-
Re: Intel
No, what makes no sense is you, and your ignorance. The entire premise that ICC would cheat to make AMD look better, shows that you don't have any clue whatsoever what "ICC" is and that you have forgotten that sabotaging your competitors is "cheating" too. Which makes this discussion pointless, since if you don't know that, well... It kind of makes your opinions on "benchmarking" worthless.
FTR, here is a good starting point just in case you actually care to find out. He makes a pretty good job at explaining it, and you can read more about it in other places in case you don't trust "some blog".
-
Re:Not like VAG
For all we know, it could be cheating -- after all, Intel has been known to do so in the past. I'm not saying it is, since I don't have the information to know. But neither do you, so you can't definitively say it isn't.
-
Re:Function multi-versioning.
nutty conspiracy rant
Except that Intel had been caught sabotaging AMD performance in the past.
-
Re:Sadly, I don't see an "out" for AMD
That is because Phoronix tests are compiled with GCC, the benchmarks used on the gaming sites (which just FYI Cinebench just got caught cheating, 30% bonus hit for any AMD chip over Intel thanks to the flags they used on the ICC) is using the Intel Crippled Compiler which has been designed from the very first release to DISABLE any and all SSE functions on any non pushed Intel chip. They have been doing this since 2005, have admitted they are doing it, and still no sanctions by DoJ. Again the DoJ proves they are absolutely worthless and might as well not exist, as they have been bought off since the MSFT trial ended in 03.
BTW before any of the fanboys chime in with their "herpa derp, Intel knows how to compile better for their own chips, derpa de do" I have 2 words for you.....Pentium III. When the ICC was released every benchmark was showing the P3 curbstomp the P4 by as much as 35%, ICC gets released, Intel throws money at benchmark sites to use ICC and...wadda ya know, P3 is suddenly losing to the exact same chips they beat a year before isn't that amazing? You can also go buy yourself ANY Via CPU, change the CPUID from "Centaur Hauls" to "Genuine Intel" and
.......gasp! Suddenly the exact same chip scores nearly 40% higher than the previous run, all thanks to the magical CPUID!The fact that GCC magically puts out code that paints a VERY different picture ought to give you a clue guys, its hilarious that you scream when companies like Comcast manipulate your Internet to push you to use their services yet here is a company that has gone on the record stating flat footed they are manipulating the market which not only kills competition but makes YOU pay higher chip prices (as if the benches weren't rigged Intel's actual performance numbers wouldn't be high enough to justify the cost and they would have to lower prices, a win for the consumer) and what do you do? Defend the corp raping your wallet.
Rigged markets are bad for everybody BUT the corp doing the rigging, its bad for competition, bad for consumers, and bad for the market as a whole. I don't care which chip you like the fact that a corp is getting away with such blatant rigging? Ought to piss you off!
-
Re:Will AMD APUs ever support ECC RAM?
The socket AM3+ does support ECC (if you choose the right motherboard, ASUS usually do...)
Yeah, I have standardized on Asus for all my builds, and the ECC support is one of the reasons.
If you want ECC for cheap you could buy a lower-end socket AM3+ processor like the FX4350
My most recent build was an FX8xxx part. FX8350 I think.
otherwise Xeon is clearly the better choice.
I have made the choice to not give Intel any of my money if I can help it. I don't like the unethical games Intel plays (example).
Processors are so fast these days anyway, that the difference between the best AMD and the best Intel are not that big a deal for my purposes. And while AMD loses on absolute performance, they generally win on performance-per-money-spent.
-
The AMD-deoptimizing Intel compiler?
Ah yes, the Intel compiler. Wasn't that also known as the compiler that "cripplied" performance for many AMD systems, by ignoring capabilities flags and instead looking for a "GenuineIntel" processor...
Yeah, that sounds like a great alternative to GCC.
See also many other links. I'll stick with GCC, thanks. At least the GCC team doesn't have a vested interest in f***ing over other hardware vendors.
-
Re:Oe noes! A compiler bug!
You're pretty clueless. Intel would beg to differ. No one that matters compiles high performance code on GCC, they use the Intel compiler.
Unless they want to not use Intel processors, because the Intel compiler was known to cripple the performance on non-Intel processors. I'm also wondering now who is buying the PathScale compiler?
-
Re:Counterbet
I bet Google DOES use some moderate amount of assembly. I once worked for an audio-recognition company and we did indeed use about 100 lines of x64 assembly to perform the inner loop, which was some complex audio signal processing routine. Similar to an FFT.
This was easily 10x faster than the C version, which we had for reference purposes, even when using the Intel compiker with all optimizations turned on.
So, just because you never saw a Tapir in your life, does not mean they can't exist because their dick is longer than you can imagine.
Maybe you shouldn't have been using an AMD processor:
(Intel has been slammed for their compiler creating code that directs non-Intel CPUs to completely unoptimized code, not taking advantage of SSE, etc, even when present in the non-Intel processor)http://www.agner.org/optimize/...
Section 2.3 of this:
http://download.intel.com/pres... -
Optimizing C++ from agner.org
Optimizing C++ was an eye-opener for me because it isn't just about how to optimize C++ but more about how things actually work (in the real world).
-
Re:Best choice for 4 out of 5 desktop users
Sorry but I call bullshit as the ONLY way you can compare a dual to a quad is if frankly you aren't even stressing the dual. If all you are doing is web surfing or watching videos? Then sure but by that argument a C2D will serve you just as well. If on the other hand you have more than 2 tabs on Chrome or are using any other SMP supporting software you WILL notice a difference between a dual core and a quad, I don't care who makes what.
And before you trot out the usual benchmarks it might do well to remember that thanks to most if not all of them using ICC they are as rigged as quack.exe and to this very day any code compiled with ICC will be crippled and there is no way to stop it, all Intel does in later releases is tell you its rigged, that is all. Why Intel didn't get an antitrust for this I don't know, other than the DOJ is toothless because this is NO different than "Windows isn't done until lotus won't run" as in both cases you are dealing with a market leader using dirty tactics to rig the market against competition. Go down that page and see what happened when they changed the CPUID of a Via chip (the only chip you can softmod the CPUID) from "Centaur Hauls" to "Genuine Intel" because when they did that? Tada, the "Intel Via" suddenly scored 30% higher on the benchmarks with the ONLY change being the CPUID.
Try running your own tests using programs compiled with GCC and I think you'll find there is MAYBE 20% - 30% difference on the high end and much lower once you get to the i5 and below. Personally after finding out about the compiler rigging, the bribing of the OEMs and the killing of the Nvidia chipsets I stopped carrying Intel and my customers couldn't be happier with the performance. I urge all of those that believe in a free market to not support market rigging and stay away from Intel. Chips like these only make it that much easier IMHO.
-
Re:does the Intel one still slow down on AMD syste
My understanding is that they never explicitly 'slowed down' AMD systems
You are wrong:
"Overview of CPU dispatching in Intel software"
http://www.agner.org/optimize/blog/read.php?i=49#121 -
Re:Also, it is fast
Does the Intel FORTRAN compiler deliberately emit suboptimal code for non-Intel chips, like the Intel C Compiler does? If so, and if you have anyone using AMD chips, you can pick up more speed by patching the binary to never use the substandard code paths.
-
Re:Why?
Intel I am kind of liking at the moment
Intel has shown that they want to charge as much money as they can. They tried to lock the industry into the Itanium chip, just because it was patented IP and nobody but Intel could make it.
Worst of all, Intel has made its compiler emit code that sabotages non-Intel CPU chips. Instead of querying the features (like whether a chip supports SSE3 or not) the code emitted by the Intel C compiler checks the "CPUID" and takes horribly slow code paths for any CPUID other than "GenuineIntel".
In mobile, Intel would like to see the industry abandon the ARM architecture and lock itself into Intel x86 chips. Not gonna happen.
I will say two good things about Intel. First, they have invested a lot in technology and they are able to make the all-around best x86 chips right now. (You will pay far too much for these, which is one reason I only buy AMD.) Second, their GPU guys have really cooperated with Linux, and Intel chipset graphics really does Just Work out of the box with Linux.
But the C compiler thing has put Intel multiple laps ahead of AMD in the ass hat race.
-
Re:Fascinating misues of adjectives there!
I'm not going to reward AMD for turning out substandard products and their poor business practice.
So your going to reward Intel for their predatory and immoral shenanigans?
They can't just make a compiler, they have to make it sabotage the competition. That's beyond the pale.
http://www.agner.org/optimize/blog/read.php?i=49
I'm voting with my dollars and I'm voting for the underdog. Yeah AMD screwed the pooch but Intel wants to screw all of us.
-
Re:I don't like boost
>> b) a darn STANDARD _Binary_ API so I don't have to worry about which _compiler_ AND _platform_ was used,
> I'm not quite sure what you mean here.This
.pdf explains why this important:
http://www.agner.org/optimize/calling_conventions.pdfParticularly: 3 Data representation
-
Re:I don't like boost
>> a) Add a PROPER 'alias' and a PROPER 'type-def'
>Define "PROPER"First, there are times:
a) When you _need_ strict type safety
b) When you _don't_ need strict type safetySecond, read up on C++ mangling/demangling because if you don't understand that you won't understand my point.
http://en.wikipedia.org/wiki/Name_manglingAgner is a guru of assembly language and has written a great document; specifically the section "The need for standardization"
http://www.agner.org/optimize/calling_conventions.pdfHere is the usage case where we want BOTH (a) and (b):
typedef int Foo;
// same as "#define Foo int" (an alt. syntax could be: 'alias Foo = int')
newtype int Bar; // We want a NEW type, one that is NOT aliases, that differs by NAME only, NOT functionality.
// There is no current way to do this C++ unless you abuse templates.
void AllowFoo( Foo foo ); // c++filt -n _Z8AllowFooi
void AllowFoo( int i ); // Oh look, this semantically equivalent as above; it is redundant/harmless in C++, since there is only 1 mangled function due to aliases not being new types
// We want the compiler to make a SEMANTIC difference between the following two functions -- they should be mangled _differently_:
void ReqBar( int i ) { ... } // We want this overloaded function so we can catch accidental calls to it OR not even provide it at all !!
void ReqBar( Bar bar ) { ... } // ILLEGAL in C++ due to lack of proper non-alias support -- it is mangled the same as: void ReqBar(int)
int i = 1;
Foo f = 2;
Bar b = Bar(3);
AllowFoo( i ); // OK typeof( i ) = typeof( int ) == int
AllowFoo( f ); // OK typeof( f ) = typeof( Foo ) == int
AllowFoo( b ); // Should be an error, but can't tell C++ this is a different type, the C/C++ typedef gives us "nothing"
ReqBar( i ); // C++ can't block this func at compile-time
ReqBar( f ); // C++ can't block this func at compile-time: typeof( f ) = typeof( Foo ) = int != typeof( Bar ) = Bar
ReqBar( b ); // typeof( b ); // typeof Bar != intDoes this make sense?
At least C++ treats SomeVoidFunc() and SomeVoidFunc(void) as being mangled the same now.
Don't even get me started on how C++ ignores the return type for the mangling! There are times when it would be _extremely_ handy to call a different function based on the return type. Sadly C++ lacks this.
>> c) STANDARDIZE the pragmas
>You've missed the point of pragmas.And you've missed the pain and agony of having to support multiple compilers because the idiotic compiler writers were too SHORT SIGHTED to understand customers do NOT want proprietary vender lock-in having to support yet-another-compiler.
Do you understand the purpose of having a _common_ way to specify structure alignment and packing??
e.g. How you disable the compiler warning about "unused variables" in Microsoft Visual Studio, GCC/G++, Intel C/C++ Compiler, etc?
We standardized the width of train tracks for a REASON.
We standardized the traffic lights even though everyone drives different vehicles from 2 wheels, 3, wheels, 18+ wheels, etc.
As programmers we are forced to solve the same dam problem over and over in everyone's "pet syntax". When is this insanity going to stop??
>> d) STANDARDIZE the error messages
> Compilers differ. This might not be possible.I'm not interested in political excuses. I
-
Re:AMD even still relevant?
Because why would you support their dishonesty when they pull shenanigans like intentionally crippling run-time performance of their code when run on non Intel hardware??
http://www.agner.org/optimize/blog/read.php?i=49#49And of course they put the disclaimer as a "gif" so text engines won't find it.
http://software.intel.com/en-us/articles/optimization-notice/#opt-enWhen Intel learns to respect their customers then I'll respect and support them.
-
Re:About bloody time...
It means that there's more than one way to skin a cat, to use a disgusting proverb, and all cpu's are not made alike. "Tests" tend to do things "the Intel way".
If you look at the GP, you'll find he was using words like "Core", "Atom" and "Netburst (P4)", a rather powerful hint he was talking about x86/x64. ARM is a red herring in this context.
As for backing stuff up, if you're interested, just start reading these "tests", and look at how the results are presented. It's pretty obvious, if you look for it. As for the rest, google lists pretty http://www.agner.org/optimize/blog/read.php?i=49#49 high..
-
Re:1.25v DDR3, but CPU efficiency...
The i7 3770K has a TDP of 95W.
I know that, at least in the past, Intel used to issue TDP numbers that represented "typical" heat, while AMD used to issue TDP numbers that represented worst-case heat (which is what TDP ought to be IMHO). I have read here on Slashdot that more recently, AMD has started playing those games as well.
But according to NordicHardware, in this case Intel is under-promising and over-delivering, and the chips really do dissipate only 77W despite being rated for 95W. (But how did they measure that? Is this a "typical" 77W? I guess it's not that hard to run a benchmark test that should hammer the chip and get a worst-case number that way.)
Curiously the AMD processors tend to stack up better on the Linux benchmark suites.
This is probably because Linux benchmarks were compiled with GCC or Clang rather than the Intel compiler. The Intel compiler deliberately generates code that makes the compiled code run poorly on non-Intel processors. The code checks the CPU ID, and the code has two major branches: the good path, which Intel chips get to run, and the poor path, which other chips run.
http://www.agner.org/optimize/blog/read.php?i=49
The irony is that Intel, by investing heavily in fab technology, is about two generations ahead of everyone else, so they can make faster and/or lower-power parts than everyone else. This means they could be competing fairly and win.
But because Intel does evil things like making their compiler sabotage their competition, I refuse to buy Intel. They have lost my business. They don't care of course, because there aren't many like me who are paying attention and care enough to change their buying habits.
If you want the fastest possible desktop computer, pay the big bucks for a top-of-the-line i7 system. But if you merely want a very fast desktop computer that can play all the games, an AMD will do quite well, and will cost a bit less. So giving up Intel isn't a hard thing to do, really.
AMEN!
-
Re:1.25v DDR3, but CPU efficiency...
The i7 3770K has a TDP of 95W.
I know that, at least in the past, Intel used to issue TDP numbers that represented "typical" heat, while AMD used to issue TDP numbers that represented worst-case heat (which is what TDP ought to be IMHO). I have read here on Slashdot that more recently, AMD has started playing those games as well.
But according to NordicHardware, in this case Intel is under-promising and over-delivering, and the chips really do dissipate only 77W despite being rated for 95W. (But how did they measure that? Is this a "typical" 77W? I guess it's not that hard to run a benchmark test that should hammer the chip and get a worst-case number that way.)
Curiously the AMD processors tend to stack up better on the Linux benchmark suites.
This is probably because Linux benchmarks were compiled with GCC or Clang rather than the Intel compiler. The Intel compiler deliberately generates code that makes the compiled code run poorly on non-Intel processors. The code checks the CPU ID, and the code has two major branches: the good path, which Intel chips get to run, and the poor path, which other chips run.
http://www.agner.org/optimize/blog/read.php?i=49
The irony is that Intel, by investing heavily in fab technology, is about two generations ahead of everyone else, so they can make faster and/or lower-power parts than everyone else. This means they could be competing fairly and win.
But because Intel does evil things like making their compiler sabotage their competition, I refuse to buy Intel. They have lost my business. They don't care of course, because there aren't many like me who are paying attention and care enough to change their buying habits.
If you want the fastest possible desktop computer, pay the big bucks for a top-of-the-line i7 system. But if you merely want a very fast desktop computer that can play all the games, an AMD will do quite well, and will cost a bit less. So giving up Intel isn't a hard thing to do, really.
-
Re:Even if this was true...
If this story is really true, it seems a very odd strategic move from Intel at a time when they're dominant in their markets. It's opening the door for people like gamers, geeks and small businesses to move to a competitor (AMD being the obvious candidate)
I think they believe they have nothing to fear from AMD. When Intel was trying to flog the sucky Pentium 4, AMD was kicking their butts; but now Intel can match any technology AMD has, and Intel is two steps ahead on fab process. TFA is talking about 14 nanometre traces; AMD's best parts are 32 nm. (I still buy AMD stuff, because I am perpetually angered at Intel over things like the C compiler that deliberately emits broken code to make non-Intel chips look bad.)
Plus, AMD has recently made boneheaded decisions that will probably doom them anyway no matter what Intel does.
I think Intel is just rubbing their hands, thinking about integrating their CPUs so tightly with Intel chipsets that no other chipsets will even be an option anymore. That's probably the case already.
-
Re:Unbelievable...
From the original post in question:
... a lot of effort was put into highly optimized bresenham line algorithms, because traditional implementations implied a div operation per pixel,
...Which is worded poorly enough to be taken to mean, "traditional bresenham implementations require a division operation at each pixel".
I also posit that the quoted post is wrong about the division instruction: DIV still takes a lot of cycles to execute*, that's just the nature of the maths involved. Of course modern processors will try to do some clever code reordering that may make slow instructions appear to be executing a lot quicker, but the instruction dependency chain does not always afford this opportunity.
* cf for example http://www.agner.org/optimize/instruction_tables.pdf
-
Re:Intel Compilers still backstabbing AMD
The part about the legal fine print being put in a GIF file just to make it harder to discover through search engines is truly special.
http://www.agner.org/optimize/blog/read.php?i=49#184
http://software.intel.com/sites/products/web2010/prod-images/opt-notice-en_080411.gif
-
Intel Compilers still backstabbing AMD
The Intel compiler, widely regarded as the best compiler available for x86, still produces code designed to make Intel chips look better than any others.
http://www.agner.org/optimize/blog/read.php?i=49
That page was posted three years ago. Scroll to the bottom, and read the latest additions to the discussion there: "New Intel compiler version - still the same!"
http://www.agner.org/optimize/blog/read.php?i=49#179
This makes it difficult to be sure how much better Intel chips really are than AMD chips. When the Intel chip scores higher on a benchmark, and the benchmark includes Photoshop, was the Intel chip actually better or was Photoshop compiled with the Intel compiler?
Sadly, I think Intel chips really are better now; given that Intel is leapfrogging past AMD on process technology, they have major advantages so their chips ought to be better.
But I still buy AMD. Yeah, I'm giving up some increment of performance... but the chips these days are so fast, I can survive on only 90% performance or whatever. And I prefer to avoid doing business with a company that continues to sell a compiler that sabotages performance on competitor's chips.
Personally, I would love to see AMD sell a line of processors that return "GenuineIntel" for the CPU ID, and thus run Intel compiler code at full speed. When Intel sues them, they can argue that this is necessary for full compatibility with the code produced by Intel's own C compiler. (Yeah, I know. It will never happen. It's a fun daydream but that's all.)
Even if AMD doesn't have the top performing chips, they continue to score very well on price/performance, and the performance is good enough for me. And they are less evil than Intel. So I remain an AMD customer.
steveha
-
Intel Compilers still backstabbing AMD
The Intel compiler, widely regarded as the best compiler available for x86, still produces code designed to make Intel chips look better than any others.
http://www.agner.org/optimize/blog/read.php?i=49
That page was posted three years ago. Scroll to the bottom, and read the latest additions to the discussion there: "New Intel compiler version - still the same!"
http://www.agner.org/optimize/blog/read.php?i=49#179
This makes it difficult to be sure how much better Intel chips really are than AMD chips. When the Intel chip scores higher on a benchmark, and the benchmark includes Photoshop, was the Intel chip actually better or was Photoshop compiled with the Intel compiler?
Sadly, I think Intel chips really are better now; given that Intel is leapfrogging past AMD on process technology, they have major advantages so their chips ought to be better.
But I still buy AMD. Yeah, I'm giving up some increment of performance... but the chips these days are so fast, I can survive on only 90% performance or whatever. And I prefer to avoid doing business with a company that continues to sell a compiler that sabotages performance on competitor's chips.
Personally, I would love to see AMD sell a line of processors that return "GenuineIntel" for the CPU ID, and thus run Intel compiler code at full speed. When Intel sues them, they can argue that this is necessary for full compatibility with the code produced by Intel's own C compiler. (Yeah, I know. It will never happen. It's a fun daydream but that's all.)
Even if AMD doesn't have the top performing chips, they continue to score very well on price/performance, and the performance is good enough for me. And they are less evil than Intel. So I remain an AMD customer.
steveha
-
Re:Let's get C99 right first
I'm afraid you have been misinformed. you see, which has been documented (and was turned over to AMD's attorneys and was rumored to be part of why Intel quickly settled for 1.25 BILLION) quite extensively here the way Intel had their compiler rigged (which they do to this very day BTW, now they simply document it) is this:
1.-Any and all code compiled with the Intel compiler unless patched beforehand not to shall look at the CPUID for Genuine Intel. 2.- If Genuine Intel found run all optimization including the latest SSEs 3.- If not Genuine Intel drop code to slowest code path, X87 mode which ties a boat anchor to the program, to the tune of 40%+.
The smoking gun, the one that proves you've been misinformed is simply this: If Intel was only afraid of an incomplete SSE then why did They cripple the Pentium III as well as AMD hmmm? Surely the ones that INVENTED the Pentium III would know which SSEs it truly supported yes? It was actually a simple answer: At the time benchmarks had the PIII stomping the early P4s by as much as 30% in some tasks, but guess what happened after the cripple code got put in? Why suddenly the P4 is WINNING by 30%! Isn't that amazing?
You've been had friend, hell you are being had to this very day every time you look at a benchmark. Even one of the netbook sites I was looking at the reviewer posted "For some reason the Atom benches higher than the E series AMD chips, but real world tests don't seem to bear the benches out" well no shit because thanks to Intel the benches might as well be named Quack.exe! Hell don't believe me, run a benchmark on your I'm sure Intel CPU then use one of the many tools out there that will let you fake a different CPUID and watch what happens!
-
Re:Power?
Dude, do you want me to back it up with links? hell you can even test it on your own machine, there are apps that will change your CPUID. run your little
/QxO on an Intel CPU, bench it, then change the CPUID to AMD and watch the numbers fall by at LEAST 30%, probably closer to 50%.But if you want more than MY word and your own lying eyes then you can read this blog by a guy that actually writes book on the subject of optimization and programming where he documents where INTEL TOLD HIM THAT IS WHAT THEY DID after he confronts them with the evidence! He has emails, he logged every conversation, I have NO doubt he is a good chunk of why Intel paid AMD 1.35 BILLION to not go to court, or do you believe Intel gave its rival one of the biggest paydays in CPU history because it was a nice thing to do?
-
No need, everyone knows...
Here's Agner Fog's page about this issue.
The Intel compiler (for many years and many versions) has generated multiple code paths for different instruction sets. Using the lame excuse that they don't trust other vendors to implement the instruction set correctly, the generated executables detect the "GenuineIntel" CPU vendor string and deliberately cripple your program's performance by not running the fastest codepaths unless your CPU was made by Intel. So e.g. if you have an SSE4-capable AMD CPU, it will run the SSE2 codepath instead of the SSE4 codepath that comparable Intel chips will run.
Over the years, MANY libraries (including several from Intel) have been compiled and shipped with this compiler, with the result that the applications compiled with those libraries including many benchmarks, also suffer from the same performance sabotage.
-
Re:You folks are truly stoned.
When more and more apps leverage OpenCL 1.1 [and the list is growing rapidly] using the likes of LLVM/Clang where AMD has worked hard at leveraging you'll begin to see a lot of these ``benchmarks'' being truly useless and tuned specifically for Intel.
The benchmarks are already useless and tuned for Intel. see http://www.agner.org/optimize/blog/read.php?i=49
-
Re:some proof would be nice
-
Intel's compilers
Quite a bit of Windows software is compiled using Intel's compilers, and they are intentionally made to sabotage performance on AMD chips. When looking at CPUID, instead of checking the features they want, they look for that _and_ the CPU being "GenuineIntel", and if not, the code chooses the worst possible implementation. This includes some major scientific math libraries and a part of popular benchmarks.
-
Re:Simple reason really
Your entire rant about Intel has been rectified. First AMD sued Intel, that case was settled over a year ago. Then the FTC gave Intel an anticompetitive smack down on top of that, which was settled nearly a year ago.
Unfortunately, according to people looking at the compiler, things haven't changed
-
Re:Not all that surprising
Intel NEEDS those specialized instructions added on to keep pace.
Note that Intel's compilers refuse to use those instructions when their output runs on AMD's and, unfortunately, the popular scientific libraries are all compiled with ones of Intel's compilers (ICC or their Fortran compiler) and only use the SIMD paths if they see "GenuineIntel" output from CPUID.
One of the most renowned software optimization experts studies this in detail in his blog. -
Re:Paying of OEMs is not their only trick..
That doesnt effect the standard library, or the extended libraries.
From Agner Fog's Blog
Version 10: Up to SSE4.2 paths for 32-bit Intel, only 386 path for non-Intel.
Version 11: Up to AVX paths for 32-bit Intel, only 386 path for non-Intel.
Version 12: Up to AVX paths for 32-bit Intel, only 386 path for non-Intel.
The story is similar with the Vector Math Library.. If (!GenuineIntel) { run shitty 25 year old code path } -
Re:Paying of OEMs is not their only trick..
That's not the whole story - there was a deliberate attempt to not use optimized instructions.
See a long discussion at http://www.agner.org/optimize/blog/read.php?i=49
-
THAT'S BECAUSE INTEL CHEATS!
Some of the benchmark programs are compiled with Intel's C++ compiler, which generates CPUID checks for the manufacturer string 'GenuineIntel' and redirects all other manufacturer's CPUs to the slowest code path. So if you can't compile the benchmark yourself with a trusted compiler, its not worth the paper its printed on.
Intel also releases several libraries that other software vendors use in their products; these libraries contain the same manufacturer check which cripples their performance on chips by AMD, Via, etc. Commercial software products such as Matlab have unintentionally or intentionally shipped with these checks, with the result that they run slower than necessary on AMD CPUs. When the manufacturer test is patched out of the program, it is un-crippled and runs as fast or faster than a comparable Intel chip.
Intel settled out of court with AMD over this, and are in the process of also settling with the FTC, but have not actually stopped the practice.
-
THAT'S BECAUSE INTEL CHEATS!
Some of the benchmark programs are compiled with Intel's C++ compiler, which generates CPUID checks for the manufacturer string 'GenuineIntel' and redirects all other manufacturer's CPUs to the slowest code path. So if you can't compile the benchmark yourself with a trusted compiler, its not worth the paper its printed on.
Intel also releases several libraries that other software vendors use in their products; these libraries contain the same manufacturer check which cripples their performance on chips by AMD, Via, etc. Commercial software products such as Matlab have unintentionally or intentionally shipped with these checks, with the result that they run slower than necessary on AMD CPUs. When the manufacturer test is patched out of the program, it is un-crippled and runs as fast or faster than a comparable Intel chip.
Intel settled out of court with AMD over this, and are in the process of also settling with the FTC, but have not actually stopped the practice.
-
Re:Not much new information
What is this about highly asymmetric execution units on Intel? link please
;-)Intel Core cores have 6 execution "ports", each serve a range of micro-operations (u-ops) and there is some overlap between them.
Some u-ops can only be sent through a single port, most can be sent through a couple specific ports, and none are suitable for all 6 ports. Most of the integer instructions can only be sent through ports 0, 1, and 5, and these ports also perform some floating point duties. The complexity creates a problem for people tasked with optimizing low level code because they need to be aware of what u-ops are generated by each instruction, and what ports they can be sent to.
This is in contrast to AMD'd setup where most integer instructions break down into u-ops suitable for any of its 3 integer execution units, and that these execution units do not perform any floating point duties.
So optimizing for AMD is a pleasure compared to optimizing for Intel. This doesnt mean that Intel design is stupid or anything, just that its a bitch to hand-optimize for.
The most extensive references arent from Intel or AMD tho, they are from a low level hack named Agner Fog. -
Re:And yet the geeks/nerds/uninformed...
I found this Agner Fog article: http://www.agner.org/optimize/blog/read.php?i=49
-
Re:Depends...
Intel's ICC compiler produces code that is more than 10% faster
You mean the same Intel compiler that detects if you're using an AMD processor and intentionally de-optimises your code? Yeah 10% faster*
* As long as you're using an Intel processor to compile.
-
Re:Started with BASIC, sure...
Now I barely touch assembly for processors like the Core I7, Athlon X2 64 & al. Not due to the complexity of the assembly language who keep being 'easy', but due to the extreme complexity of these processors. When you put together the number of instructions executed in the same cycle, the deep of the instruction pipeline, the heuristic branch prediction, some out of order execution,.. all the tricks a modern processor use make it very very hard to beat for example the intel C compiler with your own hand crafted assembly.
The Core line of processors are much simpler than the end of the Pentium line. Its really not that difficult to understand what the processor is doing (its not magic, its reasonable steps), and in fact men like Agner Fog have made it easy for you to become informed. The AMD64 line is also pretty simply.
I agree that ICC is very good. Its clearly the best C compiler on the planet, but its still not outstanding. Thats why people who specialize in optimization can still carry large salaries. Its true that there isnt a large market for optimization experts, but the market that is there desperately needs them. -
Re:Just for some perspective...
In the past, I preferred NASM for x86 cross platform development, meaning Win32 and Linux. It had decent support for the latest sets of instructions. The Microsoft syntax is something I prefer to avoid, so NASM was actually a plus in that respect, although some coworkers disagreed. There's a brief, but up-to-date comparison of x86 assemblers in Fog Agner's book. He says that YASM is better than NASM these days, and uses the same syntax. The Wikipedia page on Open Watcom Assembler also has book reference that seemingly compares MASM vs. NASM vs. TASM vs. WASM, but it's from 2005.
-
Re:Not suprised
Your little theory is disproved by this chart:
http://www.agner.org/optimize/optimizing_cpp.pdf
And scroll down to page 68. GCC does everything MSVC does, and more. The chart says that GCC doesn't implement PGO yet, but currently it does.The cause of Firefox underperforming on Linux is most certainly not using PGO in Linux builds, which is a distribution issue more than a Firefox issue.
-
Re:When I was breaking in
Sometimes cycle counts still matter, because not everybody is using a GHz machine--ask any good embedded systems programmer. Back in the 1MHz days when I got started, you never counted up if you could count down instead because it cost you a couple of cycles per loop; standard practice to any 6502 or Z-80 programmer.
When you have a fixed divisor and only care about a limited range, it's possible to reduce integer division to set of more primitive operations like shift/add, which used to be much faster on older hardware. A good example is http://www.agner.org/optimize/optimizing_assembly.pdf , P137 shows a division by 10 method. Table 9.1 breaks down how much slower integer division is than these smaller operations.
Not that any of this really matters, because of course the screen output is the only bottleneck in your program. Now if you'll excuse me, I have a lawn to keep clear.
-
Re:Analog hole...
Talked to them as in asked them to write a driver, or talked to them as in offered to write a driver and asked for programming specs?
I can think of three ways to get a Linux driver, depending on how much they dislike Linux:
- Write a from-scratch driver, assuming you can at least get a basic programming spec for the chipset.
- Disassemble their Mac OS X driver and extract programming specs, then write one from scratch. I could do that, but have neither the time nor the desire to do so.
- Take the I/O Kit Headers, run nm on their binary to find out what pieces they use, and write wrapper classes for Linux to emulate those bits of the I/O Kit on Linux, then convert their Mac OS X driver to ELF with ObjConv.
If most of the driver bits are in the kernel side of the Mac OS X driver, #3 is the most practical. If they just wrote a thin driver shim that maps it into user space or something, not so much. As a bonus, though, if somebody pulls off #3, it would be somewhat useful for other drivers. The ugly part would be faking up a close enough impedance match between their (probably custom) user client code and v4l.