Slashdot Mirror


24 Cores and the Mouse Won't Move: Engineer Diagnoses Windows 10 Bug (wordpress.com)

Longtime Slashdot reader ewhac writes: Bruce Dawson recently posted a deep-dive into an annoyance that Windows 10 was inflicting on him -- namely, every time he built Chrome, his extremely beefy 24-core (48-thread) rig would begin stuttering, with the mouse frequently becoming stuck for a little over one second. This would be unsurprising if all cores were pegged at 100%, but overall CPU usage was barely hitting 50%. So he started digging out the debugging tools and doing performance traces on Windows itself. He eventually discovered that the function NtGdiCloseProcess(), responsible for Windows process exit and teardown, appears to serialize through a single lock, each pass through taking about 200 microseconds each. So if you have a job that creates and destroys a lot of processes very quickly (like building a large application such as Chrome), you're going to get hit in the face with this. Moreover, the problem gets worse the more cores you have. The issue apparently doesn't exist in Windows 7. Microsoft has been informed of the issue and they are allegedly investigating.

7 of 352 comments (clear)

  1. The lock cycles were avg 200 us each by Anonymous Coward · · Score: 5, Informative

    Not 200S each, which is off by a factor of one million. But, hey.

  2. Windows has always been unresponsive to user input by fustakrakich · · Score: 5, Informative

    We just don't have priority...

    --
    “He’s not deformed, he’s just drunk!”
  3. I don't get it. by fuzzyfuzzyfungus · · Score: 5, Interesting

    If there is an issue that keeps process termination and cleanup from being properly parallelized; I can understand why that might cause unexpectedly poor utilization of additional cores for computationally intensive tasks that also massacre lots of processes; but why would that cause the GUI to stop responding?

    Unless moving the cursor also depends on terminating a bunch of processes; and hangs until that task is finished, wouldn't the inefficiency imposed on the build process be expected to keep the GUI more responsive; by preventing it from occupying as much CPU time as it otherwise would?

    Am I just confused? Does keeping the desktop and cursor drawn actually involve lots of time sensitive process killing? Does this indeed not make sense?

    1. Re:I don't get it. by Rutulian · · Score: 5, Interesting

      It's easy to criticize from the outside, but the Linux kernel has historically had kernel locks that created similar problems, such as the "big kernel lock", removed ca 2011 (ie: not ancient history).
      https://kernelnewbies.org/BigK...

      As noted in the article, this particular locking problem appeared in Windows 10 and wasn't present in Windows 7, so the balancing acts between the fine-grained locking mechanisms, thread performance, and backwards-compatibility are clearly challenging to maintain. Not excusing; just observing. Windows has never been known for it's ability to support massive numbers of parallel threads, so it is not surprising that previously overlooked problems can appear or become exacerbated in these situations. Many people, even here on Slashdot, laud Microsoft for the generally excellent backward- compatibility in Windows, and criticize the Linux kernel for being generally horrible at it. But here you go, a pretty nice example to illustrate that backwards-compatibility has a cost.

    2. Re:I don't get it. by mvdwege · · Score: 5, Informative

      Yes, you are making exuses, that's exactly what a tu quoque fallacy is

      The big lock was removed in 2011, Microsoft produced a regression on an already bad design a lot closer in history. That's a sign of incompetence, period.

      Also, the Linux kernel has bad backwards compatibility, which is why things like drivers and such should be upstreamed as much as possible and built in the main tree, but Linux userland still happily runs old Unix software, so you are overstating that case as well.

      --
      "I know I will be modded down for this": where's the option '-1, Asking for it'?
  4. Re:Windows... by presidenteloco · · Score: 5, Insightful

    More specifically, why are OSes not designed, and computing hardware not designed, so that the GUI cannot be slowed down by other slow processes, process switching, or I/O / virtual memory thrashing.

    The most brain-dead design-avoidable situation in the computing universe is where my computer is thrashing due to some resource over-use, and the UI is inoperable so I can't fix the problem e.g. by killing processes/programs. DOH!

    The UI and user input devices should be a completely separate set of processes and memory than the rest of application processing. It should operate as a service, through data pipelines, to the rest of the applications. It should be completely separate, in terms of resource management. Or failing that, certain aspects of GUI, such as program kill controls, should be highly prioritized over pretty much everything else.

    Again, slow and over-used everything else should not slow the UI and user input processes. This is basic.

    --

    Where are we going and why are we in a handbasket?
  5. Only 24 Cores? by Anonymous Coward · · Score: 5, Funny

    2 Core for DRM
    2 Core for DRM Protection
    2 Core for Telemetry
    2 Core for Telemetry Protection
    2 Core for Genuine Advantage
    2 Core for Genuine Advantage Protection
    2 Cores for Driver Signing Validation
    2 Cores for Driver Signing Validation Protection
    2 Cores for Cortana
    2 Cores for Cortana Telemetry
    2 Cores for Cortana Telemetry Protection
    1 Core for the Base OS
    1 Core, at 25% for user processes