Linux Getting Extensive x86 Assembly Code Refresh
jones_supa writes: A massive x86 assembly code spring cleaning has been done in a pull request that is to end up in Linux 4.1. The developers have tried testing the code on many different x86 boxes, but there's risk of regression when exposing the code to many more systems in the days and weeks ahead. That being said, the list of improvements is excellent. There are over 100 separate cleanups, restructuring changes, speedups and fixes in the x86 system call, IRQ, trap and other entry code, part of a heroic effort to deobfuscate a decade old spaghetti assembly code and its C code dependencies.
Technical Debt haunts you.
The only way to truly understand C code is to read the disassembly. Otherwise you are only assuming what the compiler is emitting.
Sometimes theory doesn't live up to reality.
Yes, I've hurd that before.
I am Slashdot. Are you Slashdot as well?
It's not a major refresh, only a modest one, and it doesn't really fix the readability issues (which would require a complete rewrite). Linux assembly is a mostly unreadable, badly formatted, macro-happy mess. The assembly in the BSDs is much more elegant and minimalistic.
-Matt
Nobody does message passing for basic operations. I actually tried to asynchronize DragonFly's system calls once but it was a disaster. Too much overhead.
On a modern Intel cpu a system call runs around 60nS. If you add a message-passing layer with an optimized path to avoid thread switching that will increase to around 200-300ns. If you actually have to switch threads it increases to around 1.2uS. If you actually have to switch threads AND save/restore the FPU state now you are talking about ~2-3uS. If you have to message pass across cpus then the IPI overhead can be significant... several microseconds just for that, plus cache mastership changes.
And all of those times assume shared memory for the message contents. They're strictly the switch and management overhead.
So, basically, no operating system that is intended to run efficiently can use message-passing for basic operations. Message-passing can only be used in two situations:
(1) When you have to switch threads anyway. That is, if two processes or two threads are messaging each other. Another good example is when you schedule an interrupt thread but cannot immediately switch to it (preempt current thread). If the current thread cannot be preempted then the interrupt thread can be scheduled normally without imposing too much overhead vs the alternative.
(2) When the operation can be batched. In DragonFly we successfully use message-passing for network packets and attain very significant cpu localization benefits from it. It works because packets are batched on fast interfaces anyway. By retaining the batching all the way through the protocol stack we can effectively use message passing and spread the overhead across many packets. The improvement we get from cpu localization, particularly not having to acquire or release locks in the protocol paths, then trumps the messaging overhead.
#2 also works well for data processing pipelines.
-Matt
I'm sure you're right, though they have something to do with micokernels. There was Linus interview from a few years back explaining his preference for the monolithic approach, and he explained that modules were introduced to give most of the benefits of the microkernel, without the drawbacks.
I'd have to see that interview to believe that's exactly what he said. In this essay by him, he says
but doesn't at all tie that to microkernels.
Loadable kernel modules in UN*Xes date back at least to SunOS 4.1.3 and AIX 3.0 in the early 1990's. I'm not sure they were introduced to compete with microkernels.
The improved assembly code is what allows the Terminator to be so efficient a killing machine.
For some time now, Mark Russinovich at Microsoft has been talking about just how bad the Windows kernel was/is in his blog.
I think you are confused. It was not Mark Russinovich, but rather Linus Torvalds, and he was talking about the Linux kernel, not the Windows kernel.
"I mean, sometimes it's a bit sad that we are definitely not the streamlined, small, hyper-efficient kernel that I envisioned 15 years ago...The kernel is huge and bloated, and our icache footprint is scary. I mean, there is no question about that. And whenever we add a new feature, it only gets worse."
Glad I could help.
Reading slashdot one-liner: (irm http://rss.slashdot.org/Slashdot/slashdot).rdf.item | fl title,desc*
There's also a cool tool called CLOC which gives a nice report about a source tree including the lines of code and in which languages they are written.
I don't really know why.
Users will say "But it works, we don't want to change waaagh scary" while simultaneously reporting 237 bugs all of which are OMG critical. Management will assume that it's cheaper, because existing stuff is already there so it's wasteful not to use it.
Now it's true that once a load of crufty business rules have built up with 17 levels of nested conditionals it can be risky to try and replicate it for fear of missing some obscure case that's bound to occur at an inconvenient time for a key customer. There's no documentation, of course. Or if there is it's the source code, six revisions behind, pasted into a word document with three screenshots taken as BMPs so the whole thing is 1.5G. This alone can make you say "sod it".
I can't find the correct phrase but maybe it's just a false analogy with physical things. Like reusing wood from an old shed to build a deck possibly is cheaper.
Confucius say, "Find worm in apple - bad. Find half a worm - worse."