24 Cores and the Mouse Won't Move: Engineer Diagnoses Windows 10 Bug (wordpress.com)
Longtime Slashdot reader ewhac writes: Bruce Dawson recently posted a deep-dive into an annoyance that Windows 10 was inflicting on him -- namely, every time he built Chrome, his extremely beefy 24-core (48-thread) rig would begin stuttering, with the mouse frequently becoming stuck for a little over one second. This would be unsurprising if all cores were pegged at 100%, but overall CPU usage was barely hitting 50%. So he started digging out the debugging tools and doing performance traces on Windows itself. He eventually discovered that the function NtGdiCloseProcess(), responsible for Windows process exit and teardown, appears to serialize through a single lock, each pass through taking about 200 microseconds each. So if you have a job that creates and destroys a lot of processes very quickly (like building a large application such as Chrome), you're going to get hit in the face with this. Moreover, the problem gets worse the more cores you have. The issue apparently doesn't exist in Windows 7. Microsoft has been informed of the issue and they are allegedly investigating.
If there is an issue that keeps process termination and cleanup from being properly parallelized; I can understand why that might cause unexpectedly poor utilization of additional cores for computationally intensive tasks that also massacre lots of processes; but why would that cause the GUI to stop responding?
Unless moving the cursor also depends on terminating a bunch of processes; and hangs until that task is finished, wouldn't the inefficiency imposed on the build process be expected to keep the GUI more responsive; by preventing it from occupying as much CPU time as it otherwise would?
Am I just confused? Does keeping the desktop and cursor drawn actually involve lots of time sensitive process killing? Does this indeed not make sense?
In my basement office I have six computers I use regularly. Two are running MacOSX, one is running Ubuntu, two are Windows XP, and one is Windows 10. I just went around the room and checked uptimes. All of them were up for more than 3 months, except the Windows 10 computer. This one computer is supposed to be pretty fast compared to the rest but it gets bogged down where I feel compelled to reboot it. It also has the nasty habit of demanding to reboot when I'm trying to get work done, but that's a different problem. After reading this I'm somewhat relieved to think I'm not doing something wrong that's making it run slow, or that I'm just imagining things.
One of the Windows XP computers claims to have been on for over 15 years. I know this is not true since it's not even that old. Does the system uptime clock rollover or something to where it cannot track uptime accurately after a few months?
I dislike rebooting or powering off computers, preferring instead to just allow them to go to sleep when I'm done with them. They tend to stay on unless there is a power outage, and the power has been unusually stable with so many lightning storms in the area. It wasn't that long ago when such lengthy periods of uninterrupted time between crashing was unheard of. I'd probably have had longer uptimes if the power had not blinked.
People may ask why I run Windows XP. It's because I have some old software that I like and it won't run on my newer Windows 10 computer. So long as the hardware keeps going I feel no need to upgrade them in any way.
People may ask why I run MacOSX. Because fuck you, that's why. I'd give a serious answer but few people ask this seriously, they just want to be snobs.
I am armed because I am free. I am free because I am armed.
I was accomplishing this on 486DX2 hardware using OS/2 in ~1994, and by 1995 on a P120.
Several years ago I stopped by a buddy's retail establishment. He was transitioning network to Ubuntu on more modern hardware (with OS/2 in a VM), but still had an old and crusty OS/2 machine (probably a K6-2, but maybe a DX4) on the bench by the back door.
This was the last time I ever saw such a thing in the wild.
It was remarkably snappy doing normal, productive things -- scanning documents, browsing web pages, writing and viewing proposals -- just like it was when it was built. (And what window tearing?)
Sometimes I think that the more abstraction layers we add, the slower things get. I think this coupled with programmer laziness (and/or pay based on lines of code), makes human-interactive things continue to behave just as slow as they have been for ~20 years.
Do we even use accelerated 2D desktop graphics anymore, or are we completely back to the bad old days of every application drawing into a dumb framebuffer?
Kid-proof tablet..
There's also the matter that until somewhat recently, most lower-end GPUs were designed to accelerate lower resolution and/or shallower bit depths than the max the card could use for Windows. For example, the card might have allowed up to 2560x1600 @ 24/32bpp, but only supported hardware 3D acceleration up to 1280x800@15/16bpp. Even when resolution finally caught up, bit depth w/acceleration was stuck at 16bpp until well into the Windows 7 era. This is why so many computers with semi-ok gaming specs still couldn't do Aero Glass transparency when Windows 7 came out... they couldn't hardware-accelerate 32-bit color.
The problem still semi-persists among many phones & tablets. If an Android device seems to get blurry for a moment during transitions, it's not your imagination... Android is dropping to lower-res/fewer colors to accelerate the transition, then going back to high-res/color dumb framebuffer mode when it's done (and text suddenly becomes sharp & clear a moment later)
The Amiga could scroll a "screen" vertically with zero tearing (and very little effort), because it was just updating a memory pointer during a horizontal retrace interval. Ditto, for updating the mouse pointer (it was just a sprite). Both worked even when the app (or OS) died because it was serviced semi-independently of the OS as a whole during the vertical retrace interrupt.
Intuition-rendered windows were another matter entirely... I think window gadgets & outlines were rendered in the vertical retrace interrupt, but contents & outside-erasures depended on the app and/or os running properly.
Likewise, the mouse pointer was only robust when it was a 320x200/400 sprite... apps like DeluxePaint & WordPerfect (which needed more precision on a 640x200/400 screen than sprites could provide) that used XOR'ed software-rendered overlays could still crash (though if you clicked outside of the crashed app's window, the sprite-rendered pointer returned)
AmigaDOS was groundbreaking, but it still had some serious issues of its own. Like an event queue that used single-bit flags, allowing users to click BOTH 'ok' AND 'cancel' if the app stalled/crashed with a dialog on-screen.