PCMark Memory Benchmark Favors GenuineIntel
javy_tahu writes "A review by Ars Technica disclosed that PCMark 2005 Memory benchmark favors GenuineIntel CPUID. A VIA Nano CPU has had its CPUID changed from the original VIA to fake GenuineAMD and GenuineIntel. An improvement of, respectively, 10% and 47% of the score was seen. The reasons of this behavior of FutureMark product are not yet known."
The reasons of this behavior of FutureMark product are not yet known
Easy. Intel paid them to make it that way.
"When life gives you lemons, don't make lemonade. Make life take the lemons back!" -- Cave Johnson
I'm a GenuineIntel, mod me 47% higher!
No, but I did throw granola at a deaf person once
Seems obvious, but follow the money trail, does PCMark get backing from Intel?
So rise up, all ye lost ones, as one, we'll claw the clouds.
A VIA Nano CPU has had its CPUID changed from the original VIA to fake GenuineAMD and GenuineIntel. An improvement of, respectively, 10% and 47% of the score was seen.
It sounds to me like this could possibly be explained by some kind of conditional optimization that the compiler puts in for various chips, to take advantage of differences in their designs that can improve performance.
Then again, probably not.
Is this like changing the user agent in a browser?
This definitely requires clarification from the creator of the benchmark.
It is possible that the benchmark uses the CPUID to change how the benchmark works, for example, to work around known flaws in a given chip. If this is the case, then the problem is not "omyghoshitplaysfavorites" but rather lack of full disclosure that the benchmarks are not directly comparable across different chips. In the most benign scenario, this could be someone at the benchmark creator's shop forgetting to tell the documentation team. This is still a very serious issue, but it's not fraud.
Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
Could it be that FutureMark uses the GenuineIntel and AMD flags to enable processor specific extensions? and then does a whole bunch of math with those extensions and never bothers to check the result?
This would indicate some really terrible code on FutureMarks part, and VIA should be flagging those op-codes as illegal op-codes, but it might be possible that something like this could happen. It is even possible that the CPUID checks are duplicated in some library somewhere that actually gets the correct code sequence right, and the main FutureMark code disables the advanced functions of the library whenever the GenuineIntel and AMD flags are missing. Thus FutureMark may feature both code sequences that work and those that don't, and the resulting incompatibilities are what causes the issues.
Why would you even consider running a benchmark program you don't have source code for and cannot compile yourself? (If you are worried about random compiler differences messing up the results, you can check an MD5 sum of the final binary against the published one, but it is important that you can reproduce the binary from source and you can read the source to find out what it does.)
If compilers like ICC cripple their code depending on CPUID, that will just lead all manufacturers to set CPUID to GenuineIntel, just as moronic websites (with help from Microsoft) ensured that all browsers call themselves 'Mozilla'.
-- Ed Avis ed@membled.com
Well, PC Mark 2005 is no longer good for testing processors against processors of another maker, i.e. only good for intra-AMD, etc.
Colin Dean Go a year without DRM
That should be AuthenticAMD, not GenuineAMD.
But that would be expecting editors to actually, you know, edit.
I will not say anything about possibilities here without my anti-conspiracy-haters-shield online (needs a lot of power), but is really strange for a benchmark (supposed to be neutral) Well, I do not really expect neutrality for a benchmark with sponsorship (or partnership?) from hardware makers like nVidia.
Religion: The greatest weapon of mass destruction of all time
I think you mean 'AuthenticAMD'.
The CPUID instruction provides feature bits that software should use to determine which instructions are available. Using the vendor string is not a reasonable way of detecting the presence/absence of instruction set extensions like SSE.
My server
It's all about money, ain't a damn thing funny.
Need an automatic screenshot taker? Try here.
V+I+A == 224
G+e+n+u+i+n+e == 715;
Genuine+A+M+D == 925
Genuine+I+n+t+e+l == 1223
The bigger the number, the faster the processor. And you get 20% extra when you pass 1000.
Ignore this signature. By order.
It sounds to me like this could possibly be explained by some kind of conditional optimization that the compiler puts in for various chips, to take advantage of differences in their designs that can improve performance.
People are trusting closed-source benchmarks? Well, golly gee, who'd'a thunk there'd be errors, oversights, or shenanigans?
If this was used for anything more than entertainment value, any methodical person would have at least compared multiple closed-source benchmarks. If that proved to be inappropriately favoring a vendor, then, OK, start calling 'conspiracy', but this just sounds like an error in a tool that was never validated.
My God, it's Full of Source!
OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
VIA's is "CentaurHauls"
AMD's is "AuthenticAMD"
Intel's is "GenuineIntel"
There's no "VIA" nor is there "GenuineAMD".
Clearly PCMark2005 is buggy (at the best) and cannot be used to compare different CPU families in this test. At the worst it is intentionally flawed, and shouldn't be used at all.
It's a shame that not one VIA Nano review benchmarked the built-in Padlock functionality. Not one OpenSSL benchmark.
This isn't the first time they've been caught doing something "odd" with their code and it likely won't be the last.
That said, keep in mind it's a 3 year old benchmark. Whatever relevance this benchmarking program has today is far more lessened by its age than by any results shown from this research. Don't get me wrong. I'm not defending Futuremark at all. I don't particularly like their suite of benchmarking tools, and not just because of the "odd" results.
How well a platform scores in Futuremark is less relevant than how well it plays your games or movies or compiles your code or rips your movies/CDs. It's my humble belief that a proper benchmark of a system is how well it will perform in the role you want to use the computer.
If I can play GRID at 1920x1200 at the maximum settings possible with playable frame rates I'm happy.
If I can play Crysis at the same resolution and settings, cool.
If AOC runs well at those settings, then I built a nice system.
If Futuremark runs well...so?
Sig Follows: "Suppose you were an idiot. And suppose you were a member of Congress. But I repeat myself." -- Mark Twain
It's a 3+ year old benchmark being let loose on 2008 vintage CPU's and making mistakes on it's optimisations. I wouldn't expect anything else. It's going to have a 3 year old view on the kind of things these CPUs can do and will act accordingly.
I want a list of atrocities done in your name - Recoil
The benchmarks is looking at the ID and making assumptions. These benchmarks run on Windows. So another possibility is that MS does an optimization that few know about. Any of these are plausible. Simplest answer should rule until proof shown otherwise; Bad assumptions made in OS or program.
I prefer the "u" in honour as it seems to be missing these days.
And why do we care about some e-penis benchmark?
If it's fast enough to play h.264 at nice high resolutions, and plays some of the fun games out, that's all I care about.
If I need CPU power, I'd go get 4 quad cores on a server with big memory and big hd. A 2d gfx card would be god enough for that server... See if they can do passive cooling on that gfx card, I bet not. And if I needed more cpu still, I'd use a dual backplane mobo with a bunch of ATI/AMD graphics cards with AIXGL (or whatever the acronym) and write applications for the GPUs.
If I were an evil fraudster at PCMark, paid for Intel to deliver worse scores to rivals, I would make sure that these rivals had no easy way of uncovering the fraud. Testing for an ID looks much more like bad code paths than like "sneaky fraud".
There is no shortage of alternative quirks that can be used to see whether a given processor belongs to one family or another. Should enough of these quirks be combined, it would be *very* hard to discover an evil-related cause.
Of course, choosing the 'bad' path given an ID may just be blatant enough to provide plausible deniability for the developers that "messed up". However, being a firm proponent of Hanlon's Razor, I would rather call it a bug than a "sponsored feature".
On the other hand, kudos to the guys at Ars who thought of changing the ID and, when the numbers did not add up, make further tests to nail down the argument. Instead of just forgetting about the problem and performing a "review as usual", which would have doubtlessly required less effort. Yay for inquisitive hacker - reviewers.
Oh...wait....nevermind.
Given Intel's track record involving anti-competitive practices, I have no doubt in my mind that Intel paid off PCMark.
you got a point there which is important to the discussion, if the source is closed, how can we know if the test is fair?
>And why do we care about some e-penis benchmark?
Agreed. Back in the day when overclocking or m/board chipsets made a tangible difference in a world where PC power trailed software requirements, benchmarks were a useful way of ensuring you were wringing the max out of your hardware. These days, almost everything is fast enough and unless you're playing frame rate willy waving on Crysis or whatever, it's really of no real interest. The broad brush approaches of CPU speed and/or number of CPUs are all you need to worry about e.g. Word processing/Internet stuff? Anything will do. Video work? fast CPU/twin/quad CPU. Games? Max it all out. worrying about if an AMD X2 6000 beats an Intel Core Duo2 whatever is really no longer of any real value.
I want a list of atrocities done in your name - Recoil
if(cpuid == "GenuineIntel")
{
Run_really_fast();
}
else if(cpuid == "AuthenticAMD")
{
Run_no_so_fast();
}
else
{
Run_slow();
}
Crysis demo for render time, or rending a specific image for FPS
Fixed that for you.
... and synthetic benchmarks.
I just hopped over to FutureMark's website and in the community section there's a list of "Most Popular Processors in 3DMark Vantage (Last 7 Days)"
Could be a coincidence but at the moment, they're all Intel.
Does it really matter whether the cause was "incredibly sloppy coding" or "Intel bribed them?" Either way, their benchmark cannot be trusted, and trustworthiness is ESSENTIAL for a benchmark. If anyone pays serious attention to this (which, having read TFA, it seems to merit), then FutureMark is toast.
"My strength is as the strength of ten men, for I am wired to the eyeballs on espresso."
You are, however, stupid.
It's sloppy and doesn't excuse Futuremark but there is one theoretically "sane" (when viewed under a certain light) explanation for what's been noticed: they took a number of CPUs and measured which memory access instruction had the least latency in itself, for example to decode it, activate proper CPU paths, etc. - so for example a MMX instruction on CPU A took X cycles before even trying to access memory, and SSE took X+n cycles, so for this particular CPU MMX is better than SSE for measuring memory performance. Of course this is really lame since new CPUs are released constantly, and a little tweak in the hardware or the microcode can invalidate the data they gathered from such tests.
This is probable because when assembler was still popular it was "well known" that certain CPUs perform certain operations faster. For a time, while it was worth it, a good assembler programmer had to know this and insert microoptimizations that depend on CPU type. Unfortunately (or fortunately), those assumptions broke sometime in the late nineties, since a) the number of CPU models on the market became huge and b) even CPUs that were theoretically in the same family started having different characteristics. I remember seeing just this for Athlon and Athlon XP (or maybe even for "early" Athlon XP and its later versions) - it was obvious that assuming anything about the CPU itself without actually measuring it on the spot is useless. A good example of this is in the Linux kernel - the MD (RAID) driver will actually measure (when kernel is booting) which instruction combination for calculating parity (among "plain" instructions, "SSE", "MMX", etc.) is faster and use that one.
-- Sig down
You can actually ask the processor which advanced instruction sets it's capable of using. Enabling/disabling certain features based on the vendor string and not based on what the processor actually claims to support is braindead.
That's like putting diesel fuel in all Volkswagens because some of them support Diesel. And then putting gasoline in a Freightliner because it's not a Volkswagen. (YAY CAR ANALOGY!)
Maxim: People cannot follow directions.
Increases in truth directly with the length of time spent explaining them
Because the faster the machine the more efficiently it does the task, allowing you to do even more concurrently and do more things in the future?
There was a time when my amd k6 350mhz could do anything i ever wanted, and even do it with passive cooling if i underclocked it to 333mhz.
Then I wanted to do more things.
Obviously a conspiracy! Where's Ralph Nader?!?!
Here's a perhaps simpler explanation. CPU benchmarks need to parse CPUID output to decide which instructions to implement. Most likely, the benchmark had never heard of these VIA CPUs that implement hot new SSE12 (or whatever) instructions; by claiming to be another vendor, the benchmark used a different instruction mix. I don't know for certain that this is what happened, but I'd bet solid money something like this is the story; we've seen analogous performance degradations at VMware when we fiddle with CPUID too aggressively.
Depends on the game, but most games come in demo form, and I suspect that most of the demos can be used to perform some kind of benchmark.
Doom3, for example, has a "timedemo" benchmark, and this runs entirely on levels included in the demo. So unless they explicitly disabled it in the demo version, I think that qualifies.
Can't speak for UT3, though.
Don't thank God, thank a doctor!
What I find hilarious about this is that it shows how hardcore, bare-to-the-metal programmers have to deal with exactly the same stupid issues as web developers.
Because there's a lot of potential here. Via suing Futuremark, Futuremark suing Ars and Intel, obviously, suing everyone.
How do we know other code isn't simply optimized for Intel CPU's? Granted, it's not in the best interest of software makers to do this unless some other incentive is in place.
More fire for the amd vs intel lawsuit first skype now this. Intel may have lot more stuff that will give them a black eye when it come out in court.
The Phoronix Test Suite.
It's Linux only, but a CPU that performs better on Linux will perform better on Windows.
Because the faster the machine the more efficiently it does the task, allowing you to do even more concurrently and do more things in the future?
What if the task isn't running the e-penis benchmark? Do these one size fits all benchmarks really tell you anything useful about real world performance?
Will a system with a higher score do every task faster than a system with a lower score? It's not that simple yet these benchmarks try to make it that simple.
Probably why there are so few opensource alternatives. Everybody smart enough to write them knows how flawed the idea is.
CPUs should never ever interpret unknown instructions as NOPs, they throw an exception instead.
If J.K.R wrote Windows: Puteulanus fenestra mortalis!
...the fact that the benchmark was developed and tweaked on an INTEL BOX??
Surely not!
Operation Guillotine is in effect.
The big problem with Future Mark is they have absolutely no credibility nor transparency. They list all the major hardware manufacturers as "partners", so how can they possibly be impartial ? Their test scores are commonly used to compare different brands, these numbers command great influence over the market... whoever gets the the highest scores is almost guaranteed to outsell their competitors, especially in the high-end segment where buyers are primarily interested in having the fastest product available, and where the high prices result in more time spent researching each purchase.
I also cannot imagine very many people buy the Pro versions of their benchmarking tools, other than major sites and publications that routinely publish detailed benchmark results. Most people are perfectly satisfied with the free one. This means the money has to come from other sources. I know for a fact, I wouldn't bother developing custom game-based benchmarks unless I was making more money than I would making actual games.
-Billco, Fnarg.com
The CPUID instruction provides feature bits that software should use to determine which instructions are available. Using the vendor string is not a reasonable way of detecting the presence/absence of instruction set extensions like SSE.
There are differences between feature bits for lets say Intel and AMD. Many bits will have the same meaning, but once you get into more esoteric things you have to read the processor manuals for each processor family first. So for example if you want to check for SSE2, the safe way would be to read the CPU Id, and if it is a processor that you have the documentation for then check for SSE2 in the way the documentation tells you.
What if you checked for SSE2 at a time when SSE2 was not implemented by some company, and the bit that Intel and AMD use for SSE2 is "for future extension" or something like that in their documentation? Do you test that bit and hope that it will work correctly in the future, or do you ignore it until they have documentation that says "this bit means SSE2"?
This is why I don't trust any benchmark that a vendor would print on their packaging. I tend to go with benchmarks from sites that run whole suites of tests, including some real-world tests. The problem with this route is of course you don't get a single number to compare which bit of hardware is the fastest.
“Common sense is not so common.” — Voltaire
I'd say, let's use http://phoronix-test-suite.com/ btw, sorry for the GenuineAMD typo ;D As it is already tagged - I meant AuthenticAMD
I agree with you.
I was wondering if there is some way we can get code audited by the community on a more formal basis, perhaps with a bounty system and a reputation system, so that one might donate to get the KDE4 code audited by me ($10), or some KDE contributor ($300), or Linus Torvalds ($10000). Then these people could develop a formal reputation system, like + or - votes on SourceforgeAuditVoting.org. They'd use their PGP signature to sign the audits.
Or something. I would view this as the next phase of the open source economy. Eventually companies might hire people with good reputations, to audit their own intra-company code.
404555974007725459910684486621289147856453481154 in hex is "You sank my Battleship?"
[GPG key in journal]
I'll give you credit for coming with a scenario that replaces malice with a heaping dose of incompetence. If what you say is true, then that's not a benchmark at all. After all, you're not comparing the same things; for all you know, you're comparing the skill of the programmer at writing for the VIA processor with the skill of the programmer at writing for the AMD processor.
You might as well write a benchmark to see how long it takes for various processors to divide 4195835.0 by 3145727.0 and come up with 1.333739068902037589! (Note: The correct answer is 1.333820449136241002.)
404555974007725459910684486621289147856453481154 in hex is "You sank my Battleship?"
[GPG key in journal]
Talk to the folks who do Intel's C/C++ compiler, because that's exactly what they do... it just happens to make AMD processors look bad.
Somewhere I still have a copy of a document Motorola put out about how Intel was playing fast and loose with benchmarks in comparisons with the 68020, for heaven's sake. I have to wonder whether Intel has a history of this kind of thing.
Well, the only real legitimate benchmark would be of encoding hundreds of real movies, ala piratebay.
If we use the best codecs with instructions to use the most out of each processor type, whats the benchmark for 100 TC captures? Or how about a blu-disk to DVD5 downconvert?
That stuff matters, not some stuffed test
It's not a call, it's an instruction. Are you talking about intercepting it with virtualization? Or are you talking about modifying the benchmark code?
Patrick Doyle
I mod down every jackass who puts his moderation policy in his sig. Oh, wait a sec....
It's authenticamd FFS
note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
If the code were open, you can bet that every hardware vendor would pore over it for anything that might be unfair to their product. It doesn't prove that it isn't, but you at least know that if there was something obviously anti-AMD in it, AMD would complain. Without the source open, you don't get the benefit of that scrutiny.
10 PRINT CHR$(205.5+RND(1)); : GOTO 10
That's correct, you do.
I would guess that when testing the same RAM with the different CPUs, different numbers would come out. So to even out the scores, they corrected by +47% for the slower Intel CPUs.
Trust is the problem. If you have the source code, you don't have to rely on trust.