24 Cores and the Mouse Won't Move: Engineer Diagnoses Windows 10 Bug (wordpress.com)

← Back to Stories (view on slashdot.org)

24 Cores and the Mouse Won't Move: Engineer Diagnoses Windows 10 Bug (wordpress.com)

Posted by BeauHD on Tuesday July 11, 2017 @12:10PM from the keyboard-mashing dept.

Longtime Slashdot reader ewhac writes: Bruce Dawson recently posted a deep-dive into an annoyance that Windows 10 was inflicting on him -- namely, every time he built Chrome, his extremely beefy 24-core (48-thread) rig would begin stuttering, with the mouse frequently becoming stuck for a little over one second. This would be unsurprising if all cores were pegged at 100%, but overall CPU usage was barely hitting 50%. So he started digging out the debugging tools and doing performance traces on Windows itself. He eventually discovered that the function NtGdiCloseProcess(), responsible for Windows process exit and teardown, appears to serialize through a single lock, each pass through taking about 200 microseconds each. So if you have a job that creates and destroys a lot of processes very quickly (like building a large application such as Chrome), you're going to get hit in the face with this. Moreover, the problem gets worse the more cores you have. The issue apparently doesn't exist in Windows 7. Microsoft has been informed of the issue and they are allegedly investigating.

42 of 352 comments (clear)

Min score:

Reason:

Sort:

Re:Not just when closing a program by Plus1Entropy · 2017-07-11 12:28 · Score: 2

Not a program, a process.

--
Only crack the nuts that crack. You don't put the ones that don't crack in the sack.
The lock cycles were avg 200 us each by Anonymous Coward · 2017-07-11 12:30 · Score: 5, Informative

Not 200S each, which is off by a factor of one million. But, hey.
1. Re:The lock cycles were avg 200 us each by vux984 · 2017-07-11 13:26 · Score: 2
  
  er no...
  2 us is 200 millionths or 0.0002 seconds.
  in fractions that would be 2/10,000ths; 1/5000th
  It is important to note that 5 thousandths is NOT the same as 1/5000th; 5/1000ths would be 0.005 seconds; which is out by a factor of 25.
  But simply expressing it as a fraction isn't american enough. It should be like their wrench sizes... so 200 us is about 7/32768ths second.
2. Re: The lock cycles were avg 200 us each by UnknowingFool · 2017-07-11 16:56 · Score: 2
  
  What are you taking about? UTF-8 is over 20 years old. HTML is even older. It's one thing not to use the newest emoji but to say you won't use encodings that haven't changed in 20+ years because they might change in the future isn't a great reason.
  
  --
  Well, there's spam egg sausage and spam, that's not got much spam in it.
3. Re:The lock cycles were avg 200 us each by Shimbo · 2017-07-11 19:55 · Score: 2
  
  The world should stick to metric.
  200uS is one five thousandth of a second
  Actually, it's one five thousands of a Siemen; case matters. I guess this whole newfangled upper and lower case thing is too hard for those writing their posts on an ASR-33.
Windows has always been unresponsive to user input by fustakrakich · 2017-07-11 12:32 · Score: 5, Informative

We just don't have priority...

--
“He’s not deformed, he’s just drunk!”
I don't get it. by fuzzyfuzzyfungus · 2017-07-11 12:43 · Score: 5, Interesting

If there is an issue that keeps process termination and cleanup from being properly parallelized; I can understand why that might cause unexpectedly poor utilization of additional cores for computationally intensive tasks that also massacre lots of processes; but why would that cause the GUI to stop responding?

Unless moving the cursor also depends on terminating a bunch of processes; and hangs until that task is finished, wouldn't the inefficiency imposed on the build process be expected to keep the GUI more responsive; by preventing it from occupying as much CPU time as it otherwise would?

Am I just confused? Does keeping the desktop and cursor drawn actually involve lots of time sensitive process killing? Does this indeed not make sense?
1. Re:I don't get it. by Anonymous Coward · 2017-07-11 12:55 · Score: 4, Informative
  
  The Windows GUI interface actually uses a separate process to update the mouse on the screen. Due to various historical reasons (compatibility with old applications, mostly), it was required to recycle this process every time the mouse moved, as the process could get a memory leak (which couldn't be fixed properly, in order to preserve compatibility with the aforementioned applications). Therefore, every time the coordinates of the mouse change, the process has to be killed and replaced, therefore putting it through the same lock that this build process is hogging. Combine that with the 200 second delay to get through the lock, and the responsiveness is easily explained.
  It's worth it to keep compatibility with the "After Dark" flying toasters screensaver, though.
2. Re:I don't get it. by gweihir · 2017-07-11 13:59 · Score: 2
  
  You are not confused. A sane kernel does not have this issue. A sane GUI stays responsive even with this issue. Unfortunately, Win10 does not have either.
  
  --
  Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
3. Re:I don't get it. by Bing+Tsher+E · 2017-07-11 15:37 · Score: 2
  
  The graphics subsystem was outside the NT kernel until NT 4.0. NT 3.51 was as close to a good true multiuser operating system as Microsoft is likely to ever come.
4. Re:I don't get it. by Rutulian · 2017-07-11 16:07 · Score: 5, Interesting
  
  It's easy to criticize from the outside, but the Linux kernel has historically had kernel locks that created similar problems, such as the "big kernel lock", removed ca 2011 (ie: not ancient history).
  https://kernelnewbies.org/BigK...
  As noted in the article, this particular locking problem appeared in Windows 10 and wasn't present in Windows 7, so the balancing acts between the fine-grained locking mechanisms, thread performance, and backwards-compatibility are clearly challenging to maintain. Not excusing; just observing. Windows has never been known for it's ability to support massive numbers of parallel threads, so it is not surprising that previously overlooked problems can appear or become exacerbated in these situations. Many people, even here on Slashdot, laud Microsoft for the generally excellent backward- compatibility in Windows, and criticize the Linux kernel for being generally horrible at it. But here you go, a pretty nice example to illustrate that backwards-compatibility has a cost.
5. Re:I don't get it. by goose-incarnated · 2017-07-11 16:46 · Score: 2
  
  You've never had the UI go unresponsive in X11 under heavy load?
  FTFA it appears to go unresponsive without a heavy load - the cores are unloaded. So, no, I've never had an unloaded Linux/BSD machine get unresponsive with X11 .
  
  --
  I'm a minority race. Save your vitriol for white people.
6. Re:I don't get it. by mvdwege · 2017-07-11 18:41 · Score: 5, Informative
  
  Yes, you are making exuses, that's exactly what a tu quoque fallacy is
  The big lock was removed in 2011, Microsoft produced a regression on an already bad design a lot closer in history. That's a sign of incompetence, period.
  Also, the Linux kernel has bad backwards compatibility, which is why things like drivers and such should be upstreamed as much as possible and built in the main tree, but Linux userland still happily runs old Unix software, so you are overstating that case as well.
  
  --
  "I know I will be modded down for this": where's the option '-1, Asking for it'?
7. Re:I don't get it. by butzwonker · 2017-07-11 19:37 · Score: 2
  
  No mainstream operating system has responsive GUIs under heavy load, especially not under heavy i/o load. GNU/Linux goes down very rapidly, Android is sluggish out of the box, and OSX have their spinning beachball of death. They are designed incorrectly.
  As a test, you may surf to this page to see how your system handles an embedded zip bomb. (Warning: Don't click this link unless you're willing to kill your browser session or even hard-reset your machine.)
Re:Windows... by presidenteloco · 2017-07-11 12:52 · Score: 5, Insightful

More specifically, why are OSes not designed, and computing hardware not designed, so that the GUI cannot be slowed down by other slow processes, process switching, or I/O / virtual memory thrashing.
The most brain-dead design-avoidable situation in the computing universe is where my computer is thrashing due to some resource over-use, and the UI is inoperable so I can't fix the problem e.g. by killing processes/programs. DOH!
The UI and user input devices should be a completely separate set of processes and memory than the rest of application processing. It should operate as a service, through data pipelines, to the rest of the applications. It should be completely separate, in terms of resource management. Or failing that, certain aspects of GUI, such as program kill controls, should be highly prioritized over pretty much everything else.
Again, slow and over-used everything else should not slow the UI and user input processes. This is basic.

--

Where are we going and why are we in a handbasket?
I remember BeOS by rsilvergun · 2017-07-11 13:00 · Score: 4, Insightful

being the only OS I've ever seen in my life move a window around screen w/o tearing. Yeah, it doesn't make much difference, but you'd think in 2017 my quad core CPU and 8 gigs of ram could do what a 400 mghz AMD K6 did in 1996 with 512 mb ram.

--
Hi! I make Firefox Plug-ins. Check 'em out @ https://addons.mozilla.org/en-US/firefox/addon/youtube-mp3-podcaster/
1. Re:I remember BeOS by adolf · 2017-07-11 14:08 · Score: 4, Interesting
  
  I was accomplishing this on 486DX2 hardware using OS/2 in ~1994, and by 1995 on a P120.
  Several years ago I stopped by a buddy's retail establishment. He was transitioning network to Ubuntu on more modern hardware (with OS/2 in a VM), but still had an old and crusty OS/2 machine (probably a K6-2, but maybe a DX4) on the bench by the back door.
  This was the last time I ever saw such a thing in the wild.
  It was remarkably snappy doing normal, productive things -- scanning documents, browsing web pages, writing and viewing proposals -- just like it was when it was built. (And what window tearing?)
  Sometimes I think that the more abstraction layers we add, the slower things get. I think this coupled with programmer laziness (and/or pay based on lines of code), makes human-interactive things continue to behave just as slow as they have been for ~20 years.
  Do we even use accelerated 2D desktop graphics anymore, or are we completely back to the bad old days of every application drawing into a dumb framebuffer?
  
  --
  Kid-proof tablet..
2. Re: I remember BeOS by Miamicanes · 2017-07-11 14:37 · Score: 3, Interesting
  
  The Amiga could scroll a "screen" vertically with zero tearing (and very little effort), because it was just updating a memory pointer during a horizontal retrace interval. Ditto, for updating the mouse pointer (it was just a sprite). Both worked even when the app (or OS) died because it was serviced semi-independently of the OS as a whole during the vertical retrace interrupt.
  Intuition-rendered windows were another matter entirely... I think window gadgets & outlines were rendered in the vertical retrace interrupt, but contents & outside-erasures depended on the app and/or os running properly.
  Likewise, the mouse pointer was only robust when it was a 320x200/400 sprite... apps like DeluxePaint & WordPerfect (which needed more precision on a 640x200/400 screen than sprites could provide) that used XOR'ed software-rendered overlays could still crash (though if you clicked outside of the crashed app's window, the sprite-rendered pointer returned)
  AmigaDOS was groundbreaking, but it still had some serious issues of its own. Like an event queue that used single-bit flags, allowing users to click BOTH 'ok' AND 'cancel' if the app stalled/crashed with a dialog on-screen.
3. Re:I remember BeOS by DNS-and-BIND · 2017-07-11 15:29 · Score: 2
  
  "I once preached peaceful coexistence with Windows. You may laugh at my expense - I deserve it."
  -- Jean-Louis Gassee, CEO Be, Inc.
  
  --
  Shutting down free speech with violence isn't fighting fascism. It IS fascism!
That design & implementation is so bad by presidenteloco · 2017-07-11 13:07 · Score: 2

It's not even wrong (to quote a famous scientist about a really ill-formed idea).
At this point with multi-core computers, the GUI and mouse etc should be on a completely separate core that is managed somewhat separately than all of the others.

--

Where are we going and why are we in a handbasket?
Re:Not just when closing a program by viperidaenz · 2017-07-11 13:10 · Score: 4, Insightful

That could be related to the hardware acceleration. If I had to guess, the Windows desktop would need to wait for the game to release it's GPU resources and load its own in to the GPU memory.
Back in the XP days, going from game to desktop was very quick, but going from desktop back to game was very slow. When Vista came along, the Os started using the GPU to accelerate the desktop. Made it slow both ways
Re:Windows... by viperidaenz · 2017-07-11 13:15 · Score: 3, Insightful

I'm glad you've volunteers to help with their concurrency programming. Good luck, it's not easy.
Usually at some point, access to shared resources needs to be controlled. There are easy ways to do it and there are hard ways. Easy isn't fast, but it's predictable and less error-prone.
Re:Not just when closing a program by Narcocide · 2017-07-11 13:24 · Score: 2

Made it slow both ways
And more expensive. Oh, the irony!
Only 24 Cores? by Anonymous Coward · 2017-07-11 13:25 · Score: 5, Funny

2 Core for DRM
2 Core for DRM Protection
2 Core for Telemetry
2 Core for Telemetry Protection
2 Core for Genuine Advantage
2 Core for Genuine Advantage Protection
2 Cores for Driver Signing Validation
2 Cores for Driver Signing Validation Protection
2 Cores for Cortana
2 Cores for Cortana Telemetry
2 Cores for Cortana Telemetry Protection
1 Core for the Base OS
1 Core, at 25% for user processes
Re: Not just when closing a program by Miamicanes · 2017-07-11 14:13 · Score: 3, Interesting

There's also the matter that until somewhat recently, most lower-end GPUs were designed to accelerate lower resolution and/or shallower bit depths than the max the card could use for Windows. For example, the card might have allowed up to 2560x1600 @ 24/32bpp, but only supported hardware 3D acceleration up to 1280x800@15/16bpp. Even when resolution finally caught up, bit depth w/acceleration was stuck at 16bpp until well into the Windows 7 era. This is why so many computers with semi-ok gaming specs still couldn't do Aero Glass transparency when Windows 7 came out... they couldn't hardware-accelerate 32-bit color.
The problem still semi-persists among many phones & tablets. If an Android device seems to get blurry for a moment during transitions, it's not your imagination... Android is dropping to lower-res/fewer colors to accelerate the transition, then going back to high-res/color dumb framebuffer mode when it's done (and text suddenly becomes sharp & clear a moment later)
Re:Windows... by jimtheowl · 2017-07-11 14:15 · Score: 2

That is similar to question I asked while using Windows NT.

It is how I got into UNIX.
Re:Windows... by Rutulian · 2017-07-11 14:49 · Score: 3, Informative

Again, slow and over-used everything else should not slow the UI and user input processes. This is basic.
The oversimplified, but short answer is that there is no such thing as a multiprocess CPU. All CPUs can execute on only a single thread per cycle. The kernel exists to allow multiple processes to be resident and to provide the illusion of multiple thread execution. In other words, the essential function of the kernel is scheduling, and in doing this the kernel has to make decisions about process priority that impact responsiveness and resource utilization in often diametrically opposite fashions. To gain responsiveness, a process that is further down the execution queue has to preempt processes further up the queue, delaying their execution. This has a negative impact on overall thread performance as your CPU will be mostly underutilized if there is a lot of preempting going on. If the kernel inhibits (or prohibits) preempting, it can more efficiently utilize your CPU, allowing many threads to get as much CPU time as possible, but this will have a very negative impact on responsiveness.
UI and user input processes are just processes to the kernel. You can, of course, just give UI and user input processes the highest possible priority at all times, but this is not automatically the best thing to do in every circumstance. For example, you probably don't want your audio stream in the background to stutter or stop playing just because you started moving the mouse. And if you are flushing a file to disk, you probably want that operation to complete atomically, rather than be interrupted by a pop-up dialog, because corrupted filesystems tend to make users pretty unhappy.
Re:Windows... by mikael · 2017-07-11 15:12 · Score: 2

Because the GUI and 3D graphics were considered bolt-ons to an existing OS kernel. Not all systems may have 3D acceleration. Some servers even avoid having a desktop as that is considered a security risk.
When the GUI is in use, the user input processing becomes the dominant process; what event happened, which widgets have been changed. A desktop with a good number of windows might have 1000+ widgets, all of which have icon images for various states. TrueType and Unicode fonts are converted into images as well. A thread or process context switch has to happen to process each window.
I've seen it myself; copying data from the hard disk drive to a backup USB drive completely slows down everything. That's with a system with eight CPU cores. The bottleneck is the CPU L1/L2 cache memory and PCI bus. All the external storage data gets swapped in and out again just to do the file transfers. This could be avoided using DMA transfers.
But there are https://stackoverflow.com/ques...">security risks associated with raw DMA file transfers.

--
Vintage computer adverts: http://www.vintageadbrowser.com/computers-and-software-ads
Re: Not just when closing a program by viperidaenz · 2017-07-11 15:36 · Score: 4, Informative

Android is dropping to lower-res/fewer colors to accelerate the transition
Or did it swap the high-res texture for a low-res one to save memory while it was not in use?
GPU's have been able to do 32bit acceleration for a long time.
Semi-OK gaming video cards that didn't support DirectX 9 couldn't run Vista Aero because it used the DirectX 9 API, required hardware based Pixel Shader 2.0 (not emulated in the driver) and at least 128MB of RAM. Not because of bit-depth.
Re: SUCK A CHOAD, BEAUHD! by OrangeTide · 2017-07-11 16:56 · Score: 2

I was kicked off the high school varsity team because I couldn't answer a pop quiz on operating system architecture.

--
“Common sense is not so common.” — Voltaire
Why a toy OS on that system? by dbIII · 2017-07-11 17:08 · Score: 2

Win10 on something with so much grunt?
Why turn an expensive system into a limited toy?
If you need to run MS compatible stuff MS Win7 and various MS server systems are available.
Re:So it's not just me by superwiz · 2017-07-11 18:34 · Score: 3, Informative

People may ask why I run Windows XP. It's because I have some old software that I like and it won't run on my newer Windows 10 computer.

It's why people virtualize old PCs now. You run your old PC in a window.

One of the Windows XP computers claims to have been on for over 15 years.
32 bits of milliseconds is 49 days. Windows XP is a 32 bit system and a common way to measure how long it's been up is by issuing a system call which returns the number of milliseconds since the system startup.

--
Any guest worker system is indistinguishable from indentured servitude.
Fork Bomb ! by BESTouff · 2017-07-11 19:35 · Score: 3, Insightful

Soo, that means that a simple DoS is possible via old-school fork bombs ? In 2017 ? Well done Microsoft !
Re: Windows... by TheRaven64 · 2017-07-11 21:52 · Score: 4, Insightful

This issue didn't exist, so someone went and added synchronisation around the tear-down process. Why? People don't generally add mutexes at random, so this was probably done because they found a race condition in process tear-down that resulted in either a resource leak or kernel data structure corruption. Simply removing the mutex will probably fix this problem, but make the computer either run out of some kind of resources or crash. Probably not the correct fix...

--
I am TheRaven on Soylent News
Re:Oh, that explains a lot by TheRaven64 · 2017-07-11 22:00 · Score: 2

Fork is heavily optimised on *NIX operating systems, because it's the primary way of creating new processes. Unfortunately, it's also a completely brain-dead one. It originates from old systems that had the running process in one form of storage and switched by writing it out to another. Fork made sense then, because you'd create the new process by writing out your current state and still have a copy of it in online storage for free. On modern systems, you need to mark all of the memory copy-on-write, create copies of the file descriptor table in the kernel (incrementing reference counts for all open devices) and do fairly complex things with each thread.
Windows post-dates the systems where fork made sense as an abstraction and so combines creating and launching a new process into the same operation (modern *NIX systems do this with posix_spawn as well). Implementing fork on Windows requires emulating all of the behaviour of fork: you have to create a new process with a copy of the current file descriptor set and then set up CoW shared memory mappings for all of your memory. If you're then doing an exec afterwards, then this is entirely wasted effort.

--
I am TheRaven on Soylent News
Re:Windows... by swilver · 2017-07-11 23:04 · Score: 2

...and that is all completely avoidable.
This is the result of bad design not any inherent limitation in the hardware or lack of DMA use. The PCI bus is involved in all cases (DMA doesn't transfer things magically).
Not only is there no need to keep copied data in memory (or even swap out other processes to increase the disk cache like Windows is fond of doing), but you can even turn off caches for copy processes to avoid trashing them.
Furthermore, rules can be created to govern when something is worthy of caching and when it is not (and many systems have those, taking into account things like type of access, random, sequential, which process interactive/background, how many accesses are queued up, big or small read/writes). Limits can also be set as to how much space a disk cache can consume.
As an example of how stupid Windows is/was, I remember the days when leaving Windows running overnight while downloading some torrents would leave me with a completely unresponsive system in the morning because everything was swapped out... why? Because Windows thought it was a good idea to cache all your overnight downloads (which is a stupid thing to do when your disk can read/write 50 MB/sec+ and your internet can hardly manage 1 MB/sec).
My solution at the time was: add enough real memory, and turn off the swap file. That artificially limited the amount of memory that could be devoted to disk caching.
Firewalling a GPU by DrYak · 2017-07-12 00:48 · Score: 2

Is there such a thing as firewalling or sandboxing a GPU for that?
Yes and no.
Yes, there's a possibility to firewall against hardware.
- that's what IOMMU is for on modern processors.
So hardware with DMA (Direct Memory Access. That can directly read the RAM e.g.: FireWire, Infiniband, 10Gigabit Ethernet, Thunderbolt, etc.) can be isolated and cannot be used to dump the whole PC memory (see earlier attack with FireWire RDMA on Windows).
- modern GPU processors have even implemented their own MMU layer for additional fencing - so that 3D game that you've downloaded (or even WebGL and WebVulkan online game) doesn't secretely try to peek into other applications on your dekstop.
No, it won't help at all in the problem of scheduling.
That's a software problem.
The kernel is scheduling CPU cycles to thread wanting execution, and it scheduling access to resource for I/O requests by thread.
It's the kernel job to decide when to interrupt one task to give access to another (either CPU cycles or I/O access).
Knowing that interrupting more often give chance to other background task to work (gives better responsivity, even the mouse cursor gets the necessary cycles while a big computation is in progress)
And knowing that keeping task uninterrupted lets them finish quicker (gives better performance).
Balancing this responsivity vs. performance is a complex dark art.
Windows simply sucks at this.
This is even complicated by the fact that the modern mouse comunicates over USB or Bluetooth instead of PS/2 which are much more protocols with more component in the complex stacks that handle them.
This requires even more time shares given to these component to make sure that the mouse moves responsively - while knowing that this will kill the performance.

--
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
Re:Oh, that explains a lot by gweihir · 2017-07-12 00:57 · Score: 2

Windows not forking is not bad engineering? Funny. Architecture falls under engineering as well, you know.

--
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
OS and process scheduling by DrYak · 2017-07-12 01:06 · Score: 4, Insightful

More specifically, why are OSes not designed, and computing hardware not designed, so that the GUI cannot be slowed down by other slow processes, process switching, or I/O / virtual memory thrashing.
That's why OSes such a Linux (not even over-optimized for responsivity) have an entire zoo of CPU schedulers and IO schedulers.
(with BFQ being the latest popular IO scheduler for responsivity),
and linux specifically has hte non-POSIX "CGROUPS" extension that enables it to arrange the various processes into a tree hierarchy with each node supporting its own scheduling tactics between its childern (see demos of 256 GCC compiler jobs launched in parallel and the GUI still being responive).
(That's also part of the reasons why modern complex manager like systemd are getting popular, they have modules to handle all this : session, seats, etc. concepts that POSIX lacks)
BeOS was an OS whose entire purpose was exactly that : no matter what, keep UI responsive and avoid media stuttering.
(Well, running initially on architecture with less expensive context switches did also help a lot).

The UI and user input devices should be a completely separate set of processes and memory than the rest of application processing.
Actually, in most OSes, they already are.

It should operate as a service, through data pipelines, to the rest of the applications.
That's a tiny bit less obvious. Some graphical tool-kits run their UI in the main thread.
Some software would need to have the processing moved into a background thread or process.
WebApps are an obvious counter-exemple where the UI is an entirely different process (And depending on where the sever is executed - even different machine).

Or failing that, certain aspects of GUI, such as program kill controls, should be highly prioritized over pretty much everything else.
Again, slow and over-used everything else should not slow the UI and user input processes.
And then you'd complain that any complex calculation (compression of a video) takes ages, because the process is constantly being interrupted to give time to your GUI and mouse (i.e.: to the various driver and daemons and libraries processing USB and/or Bluetooth) even if they don't need.
Balancing responsivity (i.e. constantly interrupting everything just to be sure that everyone get their share of CPU cycles and IO) and performance (running as much un-interrupted as possible so the task finishes as fast as possible) is a complex dark art.
But, yeah, Windows is significantly worse at this compared to everyone else.
Which also explains you'll never see any deployment of Windows on the TOP500 (it's nearly Linux all the way, with a few excption like BSD - i.e. other Unix-type of OSes), and Azure is the only known cloud running it.
It's also why Linux is popular in most embed systems (modems, routers, smartphone, tons of IoT gizmos, smart TVs, etc. - basically nearly anything with a CPU that is not a desktop computer is likely to run some Unix-like kernel like Linux)
It's not that Linux is magic, it's that Windows is *THAT* awfully bad at anything.

--
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
Re:Oh, that explains a lot by TheRaven64 · 2017-07-12 01:22 · Score: 2

posix_spawn still does the fork/exec. It is a library function after all, not a system call.
That's true on FreeBSD, and I think it's true on Linux. It's not true on NetBSD, XNU or Solaris, and I don't think it's true on AIX either. posix_spawn was designed to be possible to implement as a library routine, but (particularly in the presence of threads) it's much more efficient to implement it in the kernel.

And it doesn't take as long as you seem to think to mark things COW.
On a system designed for it, no it's not terribly expensive (though the IPIs required for synchronising the page tables across multiple cores actually add quite a bit to the cost on modern hardware), but on a kernel not designed for fork (such as Windows NT), it's expensive to do it post factor.

It only marks open file descriptors, not devices. A much shorter list. Usually only three, stdin/stdout/stderr. There can be more though, but rarely over 5.
Not even slightly true.

One of the reasons Windows isn't used for supercomputers
And that's where I can tell that you have no idea what you're talking about. Thread creation time is completely irrelevant to supercomputing (as are most OS processes). Supercomputing workloads aim to spend 100% of their time in the userspace code that's actually solving the problem. They usually run the entire network stack in userspace to avoid kernel entry (Infiniband has supported this for decades, modern Ethernet NICs do now for low-end systems), set one thread per core, pin it, and avoid entering the scheduler. They also typically try to avoid any I/O. There are several reasons why Windows isn't used in supercomputers (license fees, difficulty of customisation by third parties), but that's definitely not one of them.

--
I am TheRaven on Soylent News
Re:Windows... by BronsCon · 2017-07-12 02:41 · Score: 2

It depends how you define CPU, and I think Rutulian was defining it as a single core, in which case he's actually mostly right. I mean, now we have hyperthreading, which can attempt to execute two threads, but stalls when the execution paths of both threads would cross, so that it can decide which thread should be allowed to execute first. This happens more often than you might imagine, so the performance boost can be anywhere from absolutely none at all, all the way up to nearly double. That is to say, a CPU (core) with hyperthreading can execute, on average, just under 1.5 threads per cycle.

--
APK quotes people (including myself) without context and should not be trusted. Just thought you should know.
Re: Oh, that explains a lot by Grishnakh · 2017-07-12 03:55 · Score: 2

Bad engineering is *always* the fault of bad management, no exceptions. It's management's job to manage the engineers, and identify and fire the bad ones, while facilitating the rest and ensuring their time is productive. Engineers have no power; only management does, so management gets 100% of the blame for the outcome.