Slashdot Mirror


Windows 7 On Multicore — How Much Faster?

snydeq writes "InfoWorld's Andrew Binstock tests whether Windows 7's threading advances fulfill the promise of improved performance and energy reduction. He runs Windows XP Professional, Vista Ultimate, and Windows 7 Ultimate against Viewperf and Cinebench benchmarks using a Dell Precision T3500 workstation, the price-performance winner of an earlier roundup of Nehalem-based workstations. 'What might be surprising is that Windows 7's multithreading changes did not deliver more of a performance punch,' Binstock writes of the benchmarks, adding that the principal changes to Windows 7 multithreading consist of increased processor affinity, 'a wholly new mechanism that gets rid of the global locking concept and pushes the management of lock access down to the locked resources,' permitting Windows 7 to scale up to 256 processors without performance penalty, but delivering little performance gains for systems with only a few processors. 'Windows 7 performs several tricks to keep threads running on the same execution pipelines so that the underlying Nehalem processor can turn off transistors on lesser-used or inactive pipelines,' Binstock writes. 'The primary benefit of this feature is reduced energy consumption,' with Windows 7 requiring 17 percent less power to run than Windows XP or Vista."

10 of 349 comments (clear)

  1. Ouch by TheRaven64 · · Score: 4, Informative

    I should know better than to click on InfoWorld links, but I think I just lost about 10 IQ points as a result of reading that article.

    In summary, Windows 7 now tries to keep threads on the same processor. It has been known for about 15 years that this gives better cache, and therefore overall, performance. Any scheduling algorithm developed in the last decade or so includes a process migration penalty, so you default to keeping a thread on a given processor and only move it when that processor is overly busy, another one is not, and the difference is greater than the migration penalty (which is different for moving between contexts in a core, between cores, and between physical processors, due to different cache layout). This also helps reduce the need for locking in the scheduler. Each CPU has its own local run queue, and you only need synchronization during process migration.

    If Vista, or even Windows Server 2003, didn't already do this, then I would be very surprised. FreeBSD and Linux both have done for several years, and Solaris has for even longer. Fine-grained in-kernel locking is not new either; almost every other kernel that I know of that supports SMP has been implementing this for a long time. One of the big pushes for FreeBSD 5 (released almost a decade ago) was to support fine-grained locking, where individual resources had their own locks, and FreeBSD was a little bit behind Linux and a long way behind Solaris in implementing this support.

    --
    I am TheRaven on Soylent News
  2. Re:Power savings by VGPowerlord · · Score: 4, Informative

    From what I've seen, unless you're on a Core i7, you're not getting the power savings.

    The 17% power savings mentioned on page 3 of the article is primarily for the Intel Xeon 3500 and 5500 lines (the Nahalem processors), which shut off power to cores that aren't being actively used. The other linked articles go into this more in depth.

    --
    GLaDOS for President 2016! "Well here we are again. It's always such a pleasure." -- GLaDOS, 2011
  3. Re:I disagree with *you* by TheRaven64 · · Score: 5, Informative

    Lots of things affect performance. One of the big things is cache usage. A L1 cache miss costs around 10 cycles these days. A L2 cache miss costs 200 or more. If you move a process (or thread) between cores on the same die with a shared L2 cache, then every load or store instruction for a little while will cause a L1 cache miss. If you move them between processors with no shared cache, then every access will cause a L2 cache miss. If, every time you schedule a thread, it is on a different processor then, given that a typical scheduling quantum is only 10ms, your thread will spend most of its time loading data from main memory to cache. This will show up as 100% CPU usage, but will only be getting something like 10% of the maximum theoretical throughput for that CPU. Improve processor affinity, and you can easily see a large speedup relative to this.

    --
    I am TheRaven on Soylent News
  4. Re:Something is wrong with Win7 power management by L4t3r4lu5 · · Score: 4, Informative

    Latest ABIT BIOS resolves a lot of issues with the temperature sensor on IP35 boards. Check the ABIT forums.

    And work on your Google-fu.

    --
    Finally had enough. Come see us over at https://soylentnews.org/
  5. Re:Not Really by RicktheBrick · · Score: 4, Informative

    I do volunteer work for world community grid. I use to run 7 computers. I now run 4 quad computers. A quad will beat 4 computers in work done and will use less electricity than 4 computers running at comparable speeds. My electricity bill went down when running the 4 quads than it was with the 7 computers and daily contribution has more than doubled.

  6. Re:Not Really by Jah-Wren+Ryel · · Score: 5, Informative

    not surprising because the OS really can't do that much to improve (or mess up) the performance of user-mode code that isn't making many OS calls anyways.

    Others have already mentioned scheduling and cache thrashing, I'd like to add memory management. There are lots of ways memory management choices can degrade performance, sometimes drastically.

    One example is page sizes and the TLB - each cpu has a hardware TLB which is like a cache of virtual page to physical page address maps. Hardware TLB look-ups are fast, but the TLB is only of limited size and when a virtual address is not in the hardware TLB, the OS has to take a fault and walk its own software-maintained TLB that holds the complete list of virt2phys translations. That's a couple of orders magnitude slower than getting it from the hardware TLB.

    One way to reduce TLB misses is to use larger pages. So an OS that is smart enough to automagically coalesce 4K pages to 4MB (or larger, depending on the hardware) pages can significantly improve TLB performance. In a pathological case, that could result in a 100x-1000x speed-up, in typical cases where it is going make an difference you'll probably see ~10% performance improvement.

    Another related example is how shared memory is handled. Every page of virtual memory has a PTE which, at the most basic level, contains the virt2phys translation. When shared memory is used, a decision must be made - are the PTEs shared, or does each process get a separate copy of the PTEs for the shared memory. Downside of sharing PTEs is that the shared memory must be mapped at exactly the same virtual address in each process that uses it, so if one of those processes already has something else at that address, it won't be able to use the shared memory. The downside of using separate copies of PTEs is that you can really suck up a lot memory for just the PTE list -- imagine 50 processes that all share on chunk of 100MB of memory, if they all get their own PTE copies for that 100MB its the equivalent of 5GB worth of PTEs. If a PTE itself takes up 32 bytes, then that's at least 40MB of PTE entries just to manage that 100MB of memory. A 40% overhead is huge and then there is the issue of hardware TLB misses which, depending on the implementation, may have to search all PTEs in the system, so the more PTEs the worse a TLB miss will hurt performance.

    --
    When information is power, privacy is freedom.
  7. Comment removed by account_deleted · · Score: 4, Informative

    Comment removed based on user account deletion

  8. Re:Windows 7 is better than Linux by petermgreen · · Score: 4, Informative

    XP X64 sucks
    Do you have a source for that claim, i've only run it briefly but the only real issue I found was driver availibility.

    and it is most assuredly not the same thing on any level as Win2K3 server.
    The server specific functionality has of course been stripped out and the crippling adjusted but afaict the version of the major components is the same and it even uses the same hotfixes and service packs as the x64 version of server 2003.

    --
    note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
  9. Re:Did we really expect different? by Gadget_Guy · · Score: 4, Informative

    Windows 7 is NOT as or Faster than XP. PERIOD, stop the lies already.

    Did you even read the article? There's a simple performance table on the first page, followed by the analysis:

    These results suggests that when considering Windows 7, performance should be viewed as a reasonable justification for upgrading from Windows XP, but not a driver for migration from Vista.

  10. Re:Not Really by lukas84 · · Score: 4, Informative

    There's a multitude of possible migration paths to take, to go from Windows XP to Windows 7 - in fact there is only one migration path that's exclusive to Windows Vista - the inplace upgrade.

    The inplace upgrade is a horribly bad idea, and you should never try or consider it.

    So, for any reasonable person, the migration paths available from XP to 7 are exactly the same as from Vista to 7. A new, clean install, followed by a migration of application settings.

    If you're a home user, use Windows Easy Transfer to save all your settings to an external drive, reinstall, then recover your settings from the external drive. Reinstall all apps.

    If you're a business user, there's a wealth of options available to you - check out the documentation for MDT2010, which can provide you with all the tools you need to roll out Windows 7 in your company. USMT and Windows Easy Transfer are the same under the hood - so user settings can be migrated.

    A good place to start is the Windows 7 springboard:
    http://technet.microsoft.com/en-us/windows/default.aspx