24 Cores and the Mouse Won't Move: Engineer Diagnoses Windows 10 Bug (wordpress.com)
Longtime Slashdot reader ewhac writes: Bruce Dawson recently posted a deep-dive into an annoyance that Windows 10 was inflicting on him -- namely, every time he built Chrome, his extremely beefy 24-core (48-thread) rig would begin stuttering, with the mouse frequently becoming stuck for a little over one second. This would be unsurprising if all cores were pegged at 100%, but overall CPU usage was barely hitting 50%. So he started digging out the debugging tools and doing performance traces on Windows itself. He eventually discovered that the function NtGdiCloseProcess(), responsible for Windows process exit and teardown, appears to serialize through a single lock, each pass through taking about 200 microseconds each. So if you have a job that creates and destroys a lot of processes very quickly (like building a large application such as Chrome), you're going to get hit in the face with this. Moreover, the problem gets worse the more cores you have. The issue apparently doesn't exist in Windows 7. Microsoft has been informed of the issue and they are allegedly investigating.
Not 200S each, which is off by a factor of one million. But, hey.
We just don't have priority...
“He’s not deformed, he’s just drunk!”
If there is an issue that keeps process termination and cleanup from being properly parallelized; I can understand why that might cause unexpectedly poor utilization of additional cores for computationally intensive tasks that also massacre lots of processes; but why would that cause the GUI to stop responding?
Unless moving the cursor also depends on terminating a bunch of processes; and hangs until that task is finished, wouldn't the inefficiency imposed on the build process be expected to keep the GUI more responsive; by preventing it from occupying as much CPU time as it otherwise would?
Am I just confused? Does keeping the desktop and cursor drawn actually involve lots of time sensitive process killing? Does this indeed not make sense?
More specifically, why are OSes not designed, and computing hardware not designed, so that the GUI cannot be slowed down by other slow processes, process switching, or I/O / virtual memory thrashing.
The most brain-dead design-avoidable situation in the computing universe is where my computer is thrashing due to some resource over-use, and the UI is inoperable so I can't fix the problem e.g. by killing processes/programs. DOH!
The UI and user input devices should be a completely separate set of processes and memory than the rest of application processing. It should operate as a service, through data pipelines, to the rest of the applications. It should be completely separate, in terms of resource management. Or failing that, certain aspects of GUI, such as program kill controls, should be highly prioritized over pretty much everything else.
Again, slow and over-used everything else should not slow the UI and user input processes. This is basic.
Where are we going and why are we in a handbasket?
being the only OS I've ever seen in my life move a window around screen w/o tearing. Yeah, it doesn't make much difference, but you'd think in 2017 my quad core CPU and 8 gigs of ram could do what a 400 mghz AMD K6 did in 1996 with 512 mb ram.
Hi! I make Firefox Plug-ins. Check 'em out @ https://addons.mozilla.org/en-US/firefox/addon/youtube-mp3-podcaster/
That could be related to the hardware acceleration. If I had to guess, the Windows desktop would need to wait for the game to release it's GPU resources and load its own in to the GPU memory.
Back in the XP days, going from game to desktop was very quick, but going from desktop back to game was very slow. When Vista came along, the Os started using the GPU to accelerate the desktop. Made it slow both ways
I'm glad you've volunteers to help with their concurrency programming. Good luck, it's not easy.
Usually at some point, access to shared resources needs to be controlled. There are easy ways to do it and there are hard ways. Easy isn't fast, but it's predictable and less error-prone.
2 Core for DRM
2 Core for DRM Protection
2 Core for Telemetry
2 Core for Telemetry Protection
2 Core for Genuine Advantage
2 Core for Genuine Advantage Protection
2 Cores for Driver Signing Validation
2 Cores for Driver Signing Validation Protection
2 Cores for Cortana
2 Cores for Cortana Telemetry
2 Cores for Cortana Telemetry Protection
1 Core for the Base OS
1 Core, at 25% for user processes
There's also the matter that until somewhat recently, most lower-end GPUs were designed to accelerate lower resolution and/or shallower bit depths than the max the card could use for Windows. For example, the card might have allowed up to 2560x1600 @ 24/32bpp, but only supported hardware 3D acceleration up to 1280x800@15/16bpp. Even when resolution finally caught up, bit depth w/acceleration was stuck at 16bpp until well into the Windows 7 era. This is why so many computers with semi-ok gaming specs still couldn't do Aero Glass transparency when Windows 7 came out... they couldn't hardware-accelerate 32-bit color.
The problem still semi-persists among many phones & tablets. If an Android device seems to get blurry for a moment during transitions, it's not your imagination... Android is dropping to lower-res/fewer colors to accelerate the transition, then going back to high-res/color dumb framebuffer mode when it's done (and text suddenly becomes sharp & clear a moment later)
Again, slow and over-used everything else should not slow the UI and user input processes. This is basic.
The oversimplified, but short answer is that there is no such thing as a multiprocess CPU. All CPUs can execute on only a single thread per cycle. The kernel exists to allow multiple processes to be resident and to provide the illusion of multiple thread execution. In other words, the essential function of the kernel is scheduling, and in doing this the kernel has to make decisions about process priority that impact responsiveness and resource utilization in often diametrically opposite fashions. To gain responsiveness, a process that is further down the execution queue has to preempt processes further up the queue, delaying their execution. This has a negative impact on overall thread performance as your CPU will be mostly underutilized if there is a lot of preempting going on. If the kernel inhibits (or prohibits) preempting, it can more efficiently utilize your CPU, allowing many threads to get as much CPU time as possible, but this will have a very negative impact on responsiveness.
UI and user input processes are just processes to the kernel. You can, of course, just give UI and user input processes the highest possible priority at all times, but this is not automatically the best thing to do in every circumstance. For example, you probably don't want your audio stream in the background to stutter or stop playing just because you started moving the mouse. And if you are flushing a file to disk, you probably want that operation to complete atomically, rather than be interrupted by a pop-up dialog, because corrupted filesystems tend to make users pretty unhappy.
Android is dropping to lower-res/fewer colors to accelerate the transition
Or did it swap the high-res texture for a low-res one to save memory while it was not in use?
GPU's have been able to do 32bit acceleration for a long time.
Semi-OK gaming video cards that didn't support DirectX 9 couldn't run Vista Aero because it used the DirectX 9 API, required hardware based Pixel Shader 2.0 (not emulated in the driver) and at least 128MB of RAM. Not because of bit-depth.
People may ask why I run Windows XP. It's because I have some old software that I like and it won't run on my newer Windows 10 computer.
It's why people virtualize old PCs now. You run your old PC in a window.
One of the Windows XP computers claims to have been on for over 15 years.
32 bits of milliseconds is 49 days. Windows XP is a 32 bit system and a common way to measure how long it's been up is by issuing a system call which returns the number of milliseconds since the system startup.
Any guest worker system is indistinguishable from indentured servitude.
Soo, that means that a simple DoS is possible via old-school fork bombs ? In 2017 ? Well done Microsoft !
This issue didn't exist, so someone went and added synchronisation around the tear-down process. Why? People don't generally add mutexes at random, so this was probably done because they found a race condition in process tear-down that resulted in either a resource leak or kernel data structure corruption. Simply removing the mutex will probably fix this problem, but make the computer either run out of some kind of resources or crash. Probably not the correct fix...
I am TheRaven on Soylent News
More specifically, why are OSes not designed, and computing hardware not designed, so that the GUI cannot be slowed down by other slow processes, process switching, or I/O / virtual memory thrashing.
That's why OSes such a Linux (not even over-optimized for responsivity) have an entire zoo of CPU schedulers and IO schedulers.
(with BFQ being the latest popular IO scheduler for responsivity),
and linux specifically has hte non-POSIX "CGROUPS" extension that enables it to arrange the various processes into a tree hierarchy with each node supporting its own scheduling tactics between its childern (see demos of 256 GCC compiler jobs launched in parallel and the GUI still being responive).
(That's also part of the reasons why modern complex manager like systemd are getting popular, they have modules to handle all this : session, seats, etc. concepts that POSIX lacks)
BeOS was an OS whose entire purpose was exactly that : no matter what, keep UI responsive and avoid media stuttering.
(Well, running initially on architecture with less expensive context switches did also help a lot).
The UI and user input devices should be a completely separate set of processes and memory than the rest of application processing.
Actually, in most OSes, they already are.
It should operate as a service, through data pipelines, to the rest of the applications.
That's a tiny bit less obvious. Some graphical tool-kits run their UI in the main thread.
Some software would need to have the processing moved into a background thread or process.
WebApps are an obvious counter-exemple where the UI is an entirely different process (And depending on where the sever is executed - even different machine).
Or failing that, certain aspects of GUI, such as program kill controls, should be highly prioritized over pretty much everything else.
Again, slow and over-used everything else should not slow the UI and user input processes.
And then you'd complain that any complex calculation (compression of a video) takes ages, because the process is constantly being interrupted to give time to your GUI and mouse (i.e.: to the various driver and daemons and libraries processing USB and/or Bluetooth) even if they don't need.
Balancing responsivity (i.e. constantly interrupting everything just to be sure that everyone get their share of CPU cycles and IO) and performance (running as much un-interrupted as possible so the task finishes as fast as possible) is a complex dark art.
But, yeah, Windows is significantly worse at this compared to everyone else.
Which also explains you'll never see any deployment of Windows on the TOP500 (it's nearly Linux all the way, with a few excption like BSD - i.e. other Unix-type of OSes), and Azure is the only known cloud running it.
It's also why Linux is popular in most embed systems (modems, routers, smartphone, tons of IoT gizmos, smart TVs, etc. - basically nearly anything with a CPU that is not a desktop computer is likely to run some Unix-like kernel like Linux)
It's not that Linux is magic, it's that Windows is *THAT* awfully bad at anything.
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]