Generational Windows Multicore Performance Tests
snydeq writes "Windows XP, Windows Vista, and (soon) Windows 7 all support SMP out of the box, but as InfoWorld's Randall Kennedy notes, 'experience has shown that multiprocessing across discrete CPUs is not the same thing as multiprocessing across integrated cores within the same CPU.' As such, Kennedy set out to stress the multiprocessing capabilities of Windows XP, Windows Vista, and Windows 7 in dual-core and quad-core performance tests. The comprehensive, multiprocess workload tests were undertaken to document scalability, execution efficiency, and raw performance of workloads. 'What I found may surprise you,' Kennedy writes. 'Not only does Microsoft have a firm grasp of multicore tuning, but its scalability story promises to keep getting better with time. In other words, Windows Vista and Windows 7 are poised to reap ever greater performance benefits as Intel and AMD extend the number of cores in future editions of their processors.'"
Are we supposed to be surprised that the leading OS vendor, who's had deep, intertwined relationships for decades with hardware makers is exploiting that hardware properly?
Honest question: where's the news here?
Not really, wasn't one of the major complaints about Vista that they were changing the OS architecture to tune multicore processors to the detriment of single core processors?
Elvis.
Some great mathematics in this review... it also appears as if Vista isn't just not solving the problems presented to it, but also adds new ones to increase its own workload.
Fascinating...
I run both XP and Vista on Core2Duo processors.
I'm certain with XP and less certain with Vista (I don't use it for production work) that I can get better performance by forcing everything but EXPLORER.EXE to use the second core at a low priority.
Then as I run programs, they automatically go to the first core (with EXPLORER.EXE).
This allows me to run FOLDING, an RSS reader, LogMeIn all the time but on the second core.
I especially notice a difference when I copying files at the command prompt.
The program is called PROCESS.EXE and can be found at:
http://www.beyondlogic.org/consulting/processutil/processutil.htm
It is a manual process but it is pretty simple to create a batch file to do the dirty work.
=Smidge=
Is it just my observation, or is eldavojohn an idiot?
And XP is still faster than vista or 7, even on 4 cores... And he speculates that it would be faster on 8 (although he didn't measure that)
Scalability doesn't matter if you're still slower in absolute terms on systems that are available commercially at a reasonable price. (going past 8 cores these days is a very large price jump per core)
Ian Ameline
Too much ads: Didn't read.
Great job linking to a five page story with on-top layered flash ads and new whole page ads for each page.
Like I'd give a shit about the page at all in that case.
I tried RTFA (sorry, please mod me done for this ;) but, after clicked the "print" version, I couldn't find anything that looked like a benchmark report. No numbers. No tables. No graphs. All I saw was a page of [[weasel words]] or something like that.
Sigh..
Colorless green Cthulhu waits dreaming furiously.
It is interesting that WinXP is still better in terms of performance than either. The article suggests that Win7 and Vista would be better on systems that hypothetically had 16+ cores.
/., the most popular thing to do is run VMs with virtual instances of Windows, which reduces all the hassles associated with dealing with win cruft. Got a worm? restore machine. Drivers made system unstable? restore machine. The vms are typically only given 1-2 cores, the exact use case where WinXP does way better than its successors.
But nowadays, especially in tech savvy crowds like on
So even if we move to a world with 16+ core processors, if Win7 cannot do better than a 10 year old OS, in common scenarios, how can that be called progress?
Legally obligatory sig : My opinions are my own... etc etc
Ok, so if the average user is still doing the same basic tasks, browser/email/word processing it kills me that I'm now requiring the CPU power of yesterdays servers to do these basic tasks. Having multicore systems enables software vendors to increase the bloat, because the increase in cpu/ram will take care of it; therefore hiding this increase in bloat from the user. It's no difference in converting all cars to lead bodies; as long as we put 1000hp engines in them. The user experience doesn't change b/c they still have the same 0-60 times.
For example, I've always wondered how much CPU time is wasted due to anti-virus software? Let's say you have a large windows on VMware environment. Each VM needs to have antivirus on it, if you've got a server with 10-20 VMs on it; you've got 10-20 instances of anti-virus running. There's gotta be some way to calculate the total amount of CPU and power (W) wasted on this single server to just running the antivirus scanning...
How about an increase in CPU, but either keeping the bloat the same?
When you see bullshit buzzwords in articles that look as if they've been written by marketing people then look out. Marketing-led, buzzword-laden people always have stories. Are we really supposed to be impressed that the richest OS developer in the world can actually create a SMP capable OS that actually works reasonably given that SMP systems have been around for years? From the tone of the article it's like they're shocked that it works.
Basically, this article states the obvious: Windows XP 64 is just plain faster than Vista 64 or Win7 64. By a factor of 20-40%. But to understand why, you need to read the MONEY quote. Here it is:
It's the DRM baby. You strip that out of the Kernel, and Vista and Win7 will EASILY outpace XP with their more advanced and flexible SMP capability. Until Microsoft understands that people DO NOT WANT DRM and removes it from their newer OSes, these new OSes will continue to suffer from performance problems, and thusly, acceptance and sales problems.
Come on Microsoft. Apple has figured it out, DRM is a sales loser. Do you really want to keep wasting time on a loser technology in the midst of a global recession? You blew it with Vista, but you still have a chance with Win7. Offer people a DRM-Free kernel and Win7 will FLY off the shelves.
Official Heretic from the "Church of Global Warming". Proven right thanks to whistle blowers. AGW = Flat Earth Theory
The boys celebrate their victory back home; meanwhile, on the moon, the asphyxiated whale lies dead. The end credits roll over this shot in silence.
If libertarians are so opposed to effective government, why don't they all move to Somalia?
It's not news but then nor is the article.
The software developers will quickly undo all the speed advances that should result from multi-core CPUs. Software has a much shorter development time than hardware, so all the advantage in this contest is with the software.
Those who do study history are doomed to stand helplessly by while everyone else repeats it.
It's nice to have my beliefs vindicated by someone with the time (and expertise?) to perform the tests, but how does the 32 bit version of XP compare? 32 bit XP is ubiquitous. 64 bit, not so much.
Its funny how if microsoft releases an OS like vista- people bitch about how different it is from XP, and if they release an OS like windows 7- people bitch about how its the same as vista. Personally- I care more about the great new interface in windows 7 than shaving a split second off the time it takes to compile something. Sure, for things like servers, performance is what counts, but neither windows 7 nor windows vista are operating systems intended to run as servers. They're intended to be networked business workstations, or unnetworked home computers.
First, go to the real story, bypassing an intermediate blog and two interstitial ads.
Second, the article says the performance of the newer OSs is worse than XP. "In fact, during extensive multiprocess benchmark testing, Windows 7 essentially mirrored Vista in almost every scenario. Database tasks? Roughly 118 percent slower than XP on dual-core (Vista was 92 percent slower) and 19 percent slower than XP on quad-core (identical to Vista). Workflow? A respectable 38 percent slower than XP on dual-core (Vista was 98 percent slower) and 59 percent slower on quad-core (Vista was 66 percent slower)."
Third, there are no tables or graphs anywhere in these articles, and very few numbers. As a benchmarking article, this is awful.
The Infoworld report basically says, Windows 7 is (much) slower than XP, but it will get faster when you have 24 or more cores. AND... Microsoft will make it faster, why, because they said so.
Really. Shouldn't that be in an op-ed piece. A report shouldn't speculate and frankly any speculation about Microsoft promises about better or faster... Well Microsoft's *actions* speak far louder than Infoworld's words.
... if OS/2 on a single processor still outperforms Windows NT^H^HXP on multiple processors? ;-)
I'd like to see how Server 2003 and 2008 stand up - since longer, less interactive processing is what they are tuned for. XP, Vista and 7 are tuned for quick user response.
It's great that MS was able to tune the Vista kernel to avoid locks which reduce performance on multiple cores, but I'd rather see the same work done for XP, giving us something MUCH faster on a high number of cores, rather than a pig we can compensate for with many cores.
From page 2 of TFA:
(emphasis mine)
So we are supposed to believe that the database test on Windows 7 runs 571 percent faster on a quad-core compared to a dual core?
That would be a factor of 6.71, or in terms of performance per core, a factor of 3.355. In other words, the quad core would do 3.355 times more work per core than the dual core. That sounds not very believable, considering similar tests the German C't magazine has done in the past (for Linux and Windows 2000). In those tests, both OS scaled at best linear with the number of CPUs, so the "performance boost" from going from dual to quad core was at best 100% (in most tests more like 80%).
Maybe I'm misunderstanding what Randall C. Kennedy wanted to say. Here it would have helped if he posted his raw data and test configuration, as most reputable testers do. But as he only posted a few end results, I can only say that his numbers seem bogus. I rated the Infoworld article with 1 of 5 points.
C - the footgun of programming languages
As a HPC developer, there's a few areas where XP falls down. With the release of the new Core i7 line from Intel, the end of the FSB is in sight. Both Intel and AMD now use a ccNUMA memory architecture, which has tremendous implications on software design. In short, if your software isn't aware of the system's memory topology, you're going to end up with most of your memory traffic going over the processor interconnects and that's a substantial performance hit over going directly to memory (2-4 times slower).
XP's NUMA support is very weak. Sometimes the easiest solution is to write your own allocator (and preallocate huge chunks of ram).
And before somebody comes along and says 'no real HPC is done in Windows!' there are a lot of old, crusty engineering software packages that everybody is scared of porting.
Nobody is going to argue that if you run one single application (a database something), a "small" OS will work better. There are Linux versions that are specifically geared towards doing that sort of thing, right? Ubuntu is probably slower at something than [insert other dist].
The real question is, though... what about normal usage? Unfortunately, that's hard to measure... but how does Vista/Windows 7 affect normal user productivity and speed as opposed to simple benchmarks designed to test out efficiency at doing ONE thing?
If Vista and Windows 7 were designed to have a lot of background processes to help the user do this or that, then why not test that, too? XP wasn't designed that way, apparently, while Windows Vista/7 are more designed that way. So give it a level playing field and test what it was designed to do.
I don't have an answer of whether or not Vista/Win7 are slower or faster when doing other things (like, say, searching for a file because you can't remember where you put it, running multiple applications, using something DRM enabled, or whatever), but it'd be interesting to try to test it rather than a generic "XP runs a single application faster than Vista because Vista has more stuff running in the background." It'd be interesting to try to physically load the system with lots of applications and see which is better then.
yes, it's very easy to test it, get an avi file, lets say 100MB play it on XP, measure CPU utilization, now get the same file and do the same test on vista, compare results !!! and now who is the whore? test has to be performed on identical hardware ;)
We've got all this dual core, quad core technology and at the moment almost no software companies are actually coding their programs to use it! It's rediculous and windows is absolutely right to push ahead and optomize for multiple cores- because as the base of most machine's it is the most important part of the system that needs upgrading. We can only hope this actually pushes some other software companies to start taking advantage of all this untapped power. If windows weren't optomizing for quad core and up they would be making tehmselves seriously vulnerable and would be actively slowing the adoption of multi core software systems.
On rereading, I found a link ("How I tested") that gives at least an overview of the configuration. For the hardware:
So on one hand, I have to apologize for dissing Mr. Kennedy on lack of transparency.
On the other hand, he obviously used two different systems with different amounts of RAM which can introduce new errors. For instance, lets assume the working set as defined on Wikipedia (URL:http://en.wikipedia.org/wiki/Working_set>) has a size of 6 GByte. Then the Dell OptiPlex 745 with only 4GB RAM will have to keep reloading from disk, while the HP EliteBook may be able to run entirely from cache in the second and any further pass. I consider that a bad error inmethodology.
C - the footgun of programming languages
ok that's good XP is faster than Vista or Win7 on dual cores....why do i care... hm looking at my screen right now Im running more than one app none of them are dual core aware. Only the OS is. Does the average Joe know how to set a program to the second core or do they just leave it to the os to figure out? So here is what i want to know. Windows is able to see the second core so does it take the lower used item and send them over to it while they sleep? Like the services it loads or does it just stay on the first core? If Windows could move things over to the second core by itself for sleeping apps it may become faster for the standard apps that dont know about the second core.
Sure. Can we get a honest comparison here, instead of this incestuous old bear vs. dead turkey fest? Find a couple of boxes, then let some micros~1 guys tune images for each, some linux guys tune images for each, some solaris guys tune images for each, throw in a couple BSDs (FreeBSD, DragonflyBSD, and heck, see how well NetBSD is doing too), maybe add minix and qnx for kicks, and have an independent panel run each image through a preset series of tests with automated data gathering. Systemic bias, people. It's not hard to eliminate it, just don't pull punches.
Since when did you run your high throughput transaction processing system on a desktop OS? These numbers are basically meaningless in the sense that they don't reflect anything anyone would actually do with the software that was tested.
Now, a comparison between 2000 Server, 2003, and 2008 would have been more useful.
As you say, I suppose this whole thing demonstrates that MS can make progress optimizing an SMP kernel. Sure would be pretty surprising if they couldn't!
Maybe someone will make some comparisons vs some Linux kernel builds and some OpenSolaris builds. That would be just as interesting, since it is a bit less clear whether or not those teams have or are able to make equivalent or better optimizations.
"Malo periculosam, libertatem quam quietam servitutem." -- Jefferson
How does the system 'know' when to start running the DRM? There must be something running at all times "just in case" the paying customer decides to excersise their right to play their own stuff.
Whether it is a service, thread or whatever, it doesn't matter. Some system resources have to be used in advance. That can only drop the performance.
I'll see your Constitution and raise you a Queen.
I have a Core2 Duo machine running a basic configuration of XP. Its quite fast. The only time I've seen it get bogged down is when it has to pull calendar information from Exchange, and that's not the workstation's fault.
That having been said, I'm perfectly happy with my dual P2 running NT4 you insensitive clods!
CmdrTaco should not have fallen for an article written by PR shills.
Unix kernel coders have written NUMA-aware stuff for years. When MS is late to the party, why not just say so? And if they're always late, why bother with them?
I want to delete my account but Slashdot doesn't allow it.
They've got a great story, you betcha! And it's only going to get better! Rah rah! Sis boom bah!
It's disgusting. I wanna puke.
At my work, we maintain several Windows clusters for financial derivatives valuation.
We can't really move all of them to Linux (no matter how much we would like to do it), because some of the calculations have been implemented using MS only technologies, like ActiveX (yes, you read that well) and .NET (the last time we checked it the Mono runtime was ~5-6 times slower than MS implementation of .NET for our code).
When we needed to upgrade the Windows clusters recently, we had to move from 2-cpu-1-core to 2-cpu-4-core machines, since it was what was being sold. What we've observed is that Windows (Server 2003) is unable to fairly share the CPU time when there are more active threads than available cores. We get a lot of variance on the overall calculation time when the clusters are very loaded because of this.
The same tests done on SLES-9 (yes, 9) based Linux clusters with similar hardware did not suffer from this problem. CPU time was divided equally among all threads. And we are using a 4-year old kernel, that doesn't even sport the newest completely-fair-scheduler.
Multicore optimization will just help it suck much quicker...
Come on why does anyone take this article seriously? He tested an os that is in beta. Microsoft specifically stated that the beta was not performance tested and that wont be done till later around the rc stage. Come on this article should not be taken seriously at all. I thought people on slashdot would be more tech savy and know that you cant do any sort of performance testing on a beta especially when microsoft themselves stated they havent done any performance fixing on the beta.
What I really loved was when my sound card drivers didn't have the latest DRM crap in them for Vista. I could play PCM WAV files fine, but when I tried to play a MP3 file in WMP it would say "Codec Not Installed".
It took me 2 hours, at the time, to find out that nothing was wrong with my installation, just that MP3 files go through the DRM layer and they wouldn't play because my sound drivers did not support the latest version of Vista's DRM yet.
Microsoft, Apple, Google, Amazon what's the difference? All steal money from devs and control with walled gardens.
They tested two desktop kernels against XP x64 and its Server 2003 kernel...doh!
Complete and utter bullshit.
We're a VFX company. We work with all manner of multi-core applications. Cloth simulation, Global Illumination, Caustics, Optical Flow tracking, compositing etc etc.
Every single one of our computers are 64 bit. We have Windows XP x64 and we have Vista x64.
I'm looking at a chart right this very second of render times for our current job. 9 million polygons, 6GB of RAM usage, 100% CPU usage across all 4 cores. NO RENDER PERFORMANCE HIT. Render software scales better than just about anything else on earth. Each core renders its little slice of the scene and returns it to the application. There is no cross talk, it scales pretty much linearlly with very very little overhead. If anything is going to expose some sort of massive performance hit, it would be rendering.
If Vista x64 is running a DRM check on every ray casting function we would see it. If Vista was running 118% slower we would see it. We have identical machines running the identical piece of software and they're returning on average statistically identical results.
I've got millions of photons bouncing around a scene and supposedly each calculation is being 'taxed' by some DRM check? I don't see it.
They're all generating pixels, what could be more "DRM related" than reading footage, processing footage and creating new footage?
Maybe this test found some piece of software that doesn't run well on Vista. I can buy that argument. But Vista and Windows 7 are not substantially slower than XP at processing. In fact they seem to be no slower from my experience with a wide variety of extremely processor and memory intensive tasks.
I am getting increasingly annoyed, and bored, by the fact that everytime we have a M$ story here, especially with the Windows-7 run up we see the same band of M$ astroturfers generating noise here.
Character assassination, eg of Gutmann and Schneier and opinion/speculation is no substitute for clear numerate analysis and benchmarking.
Let me offer two opinions based on nearly 40 years of operating systems experience, starting with the EDSAC-II:
(a) No one whoc claims that DRM is only active for part of the time has never seen/understood the inside of a scheduler or device driver, at the very least the DRM system must protect itself.
(b) The memory bandwith considerations for multi-core are much more complicated, especially for HPC, than just NUMA, which itself is an old kludge mandated by earlier system designs. In a very real sense the NU.. is now always present, since caches are ubiquitous and mandatory and almost always layered and hardware managed. The is no particular benefit or reason to make the backing DRAM Non Uniform, quite the opposite!
Finally I remark that there are two very different situations depending on chip architecture eg Intel v AMD and the workload. With Intel, and the FSB, a limited bandwidth, to the NorthBridge is shared out between all the cores, which is why Intel is rapidly migrating away from the FSB architecture. By contrast, an 8 core AMD system, with two quad cores has twice the memory bandwidth (almost) as the single quad system (due to scheduling latency and interfearence) in the Hyperchannel
If your workload is essentially infinitely parallelisable then you can make your workload scale, slightly less, eg 0.8, with increasing cores _and_ by keeping idle time on all cores, your system will feel snappy and responsive eg for gaming and transaction processing.
If, for some logical or physical reasons, your problem has inherent limit on its parallelism then you have a much harder problem. Examples are the partial-differential equations of mathematical physics (Navier-Stokes, Heat, Elasticity) and directed stochastic processes use in asset pricing and (sensible) risk management things are much more complex and mostly depend on parallel algorithms and minimising latency. For much more detail see the Berkeley paper http://view.eecs.berkeley.edu/wiki/Main_Page [The Landscape of Parallel Computing Research: A View From Berkeley].
To cut to the chase M$ made a marketing choice, plumb DRM into the resource schedulers and IO system, even if hook based, as it should be, this must impact parallelism and introduce locking latency costs which must be paid all the time, and if done less that perfectly, will reduce scalability which will harm responsiveness.
So you pay so that M$ can cosy up to Hollywood, RIAA and MPAA, all of whom have problems with their business model. No amount of astroturf can put lipstick on this pig!