Linux Getting Extensive x86 Assembly Code Refresh
jones_supa writes: A massive x86 assembly code spring cleaning has been done in a pull request that is to end up in Linux 4.1. The developers have tried testing the code on many different x86 boxes, but there's risk of regression when exposing the code to many more systems in the days and weeks ahead. That being said, the list of improvements is excellent. There are over 100 separate cleanups, restructuring changes, speedups and fixes in the x86 system call, IRQ, trap and other entry code, part of a heroic effort to deobfuscate a decade old spaghetti assembly code and its C code dependencies.
Technical Debt haunts you.
We live in interesting times these days. With a changeset so big, and involving assembly code that isn't as easy to understand as C code, how can we really be sure that no exploits have been introduced? How extensively have these changes been reviewed to ensure there are no exploits or potential exploits being sneaked in?
Linux has been obsolete since introducing assembly code. Minux touches the hardware in just 100 lines of code and macosx is a micro kernel as well.
It is 2015 and not 1985
Better not use Minix then, because it also has assembly.
"Set a man a fire, he'll be warm for the rest of the night. Set a man afire, he'll be warm for the rest of his life."
All C code ends up as assembler, anyway. It just gets deleted, most of the time, before you see it.
"To those who are overly cautious, everything is impossible. "
OS-X is not a microkernel - has never been. Even when NEXTSTEP was based on Mach 3.0, that too was not microkernel. Apple doesn't use any of the rules of microkernels - user space drivers, et al - in OS-X or iOS
Not replaced, you dummy.
Elevated to a new level.
All hope abandon ye who enter here.
Sometimes theory doesn't live up to reality.
Yes, I've hurd that before.
I am Slashdot. Are you Slashdot as well?
Just because Minux has only 100 lines of assembly doesn't mean anything about Darwin, even if Darwin has microkernel components, so your association there is a bit fallacious. Unless it's changed recently, Darwin does have microkernel (mach in fact) underpinnings, but a complete FreeBSD subsystem is grafted onto that. So if anything Darwin is a hybrid kernel. Like most real systems out there, it's not a complete microkernel system.
r u redi to rumble?
I've never seen a true microkernel that has the performance of a monolithic kernel. Nobody wants to buy a new computer and drag it down to a craw
Did you ever use OS-9 from Microware? (not to be confused with OS 9 from Apple)
Back in the day I ran OS-9 on a Tandy Co-Co and had a fully multi-user, pre-emptive multitasking system* running on a 6809E, 8 bit, sub 2MHz CPU. Later on I worked with a variety of industrial computers running OS-9 on 68K based systems and they worked just fine.
* I will give you that I only ever fired up the graphical desktop all of once just to see if it worked. After that I stayed in the command line.
I am Slashdot. Are you Slashdot as well?
> There's a risk of regression when exposing the code to many more systems
The risk of regression is due to refactoring, not due to testing. Ironic, given that the post cites de-obfuscation as a reason for doing this. Or perhaps our submitter just got an MBA and is learning to think and speak in management-ese.
To add your claim that XNU does not follow any microkernel rules is simply false. XNU uses microkernel-style message passing.
XNU has system calls to allow messages to be sent between processes, including sending large amounts of data by page flipping.
It just doesn't happen to use that to implement very much of the UNIX API; it's not used to implement file I/O (that goes through ordinary BSD system calls to in-kernel file systems that are called through a BSD-style VFS) or network I/O (that goes through ordinary BSD system calls to in-kernel networking stacks that are called through a BSD-style kernel socket layer) or much of the process/thread management or VM code (that goes through ordinary system calls that end up calling Mach task, thread, and VM management calls).
It is used for communication between user processes, and for some kernel user communication, but that's the same sort of use that happens in systems with Boring Old Monolithic Kernels.
Proof? I can't find such posts in Mark's blog.
if they want to audit those stale old things for correctness and performance, and make them more readable for future generations, and
do the testing and review to make sure they haven't fucked them up
then good for them. i mean really good for them.
any code - especially the kernel - isn't a concrete artifact, its a process. an organism.
heathy organisms eliminate their wastes
and a 2% performance bump in system call overhead isn't anything to sneeze at
As I understand it, NeXT / OSX started with a micro-kernel philosophy and then introduced some monolithic kernel concepts to address the performance bottleneck of messaging between true micro modules.
Meanwhile Linux starts as a monolithic kernel, but introduced (un)loadable modules to address maintainability and extendability.
So if we described it as a continuum with 'pure microkernel' being a '1' and pure monolithic kernel being a '10', then OSX would be something like a '3' and Linux would be a '7'.
Loadable kernel modules have nothing to do with microkernels. A truly micro microkernel wouldn't need loadable kernel modules because all the loadable functionality would run in userland; plenty of monolithic kernels have loadable kernel modules.
And OS X is a lot further from "pure microkernel" than 3. The "monolithic kernel concepts" include "running the networking stack in the kernel, standard BSD-style", "running the file systems in the kernel, standard BSD-style", and "implementing most process and VM management operations with standard system calls in the kernel".
> There are over 100 separate ... speedups
The last time I looked, which was quite a few years ago TBH, the BSDs have, IIRC, less than 100 lines of x86 assembly, in the bootstrap.
From relatively-recent FreeBSD:
It's about 45,000 lines in Linux 3.19's arch/x86. A fair bit of that is crypto code, presumably either generally hand-optimized or using various new instructions to do various crypto calculations.
Who can sight read assembly anymore?
Everybody who is interested in "How Things Work" can read assemblly code. Those who depend on hopes and prayers do not.
Time is what keeps everything from happening all at once.
It's not a major refresh, only a modest one, and it doesn't really fix the readability issues (which would require a complete rewrite). Linux assembly is a mostly unreadable, badly formatted, macro-happy mess. The assembly in the BSDs is much more elegant and minimalistic.
-Matt
Nobody does message passing for basic operations. I actually tried to asynchronize DragonFly's system calls once but it was a disaster. Too much overhead.
On a modern Intel cpu a system call runs around 60nS. If you add a message-passing layer with an optimized path to avoid thread switching that will increase to around 200-300ns. If you actually have to switch threads it increases to around 1.2uS. If you actually have to switch threads AND save/restore the FPU state now you are talking about ~2-3uS. If you have to message pass across cpus then the IPI overhead can be significant... several microseconds just for that, plus cache mastership changes.
And all of those times assume shared memory for the message contents. They're strictly the switch and management overhead.
So, basically, no operating system that is intended to run efficiently can use message-passing for basic operations. Message-passing can only be used in two situations:
(1) When you have to switch threads anyway. That is, if two processes or two threads are messaging each other. Another good example is when you schedule an interrupt thread but cannot immediately switch to it (preempt current thread). If the current thread cannot be preempted then the interrupt thread can be scheduled normally without imposing too much overhead vs the alternative.
(2) When the operation can be batched. In DragonFly we successfully use message-passing for network packets and attain very significant cpu localization benefits from it. It works because packets are batched on fast interfaces anyway. By retaining the batching all the way through the protocol stack we can effectively use message passing and spread the overhead across many packets. The improvement we get from cpu localization, particularly not having to acquire or release locks in the protocol paths, then trumps the messaging overhead.
#2 also works well for data processing pipelines.
-Matt
You'd presumably need to add new CPU functionality to allow fast context switches. If I remember correctly, a 20MHz Transputer took about one microsecond to switch threads, because that was one of the primary design goals. Of course, that lead to them building a stack-based CPU where almost nothing had to be saved on a context switch...
It was a monolithic kernel. One of the interesting features were devices drivers were modules and there was a small device node module which would say stuff like "used module 'serial driver', call it tty4 at IRQ 2 and address 0x454040". The kernel would deal with all IRQs in the hardware and then run the IRQ callback funtion in the proper module. That allowed user level device drivers back in the early 1980s.
Another cool feature was each software module had a CRC so it could detect bad binaries. There were ways to whitelist and blacklist based on CRC values.
I'm sure you're right, though they have something to do with micokernels. There was Linus interview from a few years back explaining his preference for the monolithic approach, and he explained that modules were introduced to give most of the benefits of the microkernel, without the drawbacks.
I'd have to see that interview to believe that's exactly what he said. In this essay by him, he says
but doesn't at all tie that to microkernels.
Loadable kernel modules in UN*Xes date back at least to SunOS 4.1.3 and AIX 3.0 in the early 1990's. I'm not sure they were introduced to compete with microkernels.
The improved assembly code is what allows the Terminator to be so efficient a killing machine.
For some time now, Mark Russinovich at Microsoft has been talking about just how bad the Windows kernel was/is in his blog.
I think you are confused. It was not Mark Russinovich, but rather Linus Torvalds, and he was talking about the Linux kernel, not the Windows kernel.
"I mean, sometimes it's a bit sad that we are definitely not the streamlined, small, hyper-efficient kernel that I envisioned 15 years ago...The kernel is huge and bloated, and our icache footprint is scary. I mean, there is no question about that. And whenever we add a new feature, it only gets worse."
Glad I could help.
Reading slashdot one-liner: (irm http://rss.slashdot.org/Slashdot/slashdot).rdf.item | fl title,desc*
There's also a cool tool called CLOC which gives a nice report about a source tree including the lines of code and in which languages they are written.
Micro kernel is not necessarily better.
No, micro kernels are plain worse. The biggest problem with micro kernels is the synchronisation problem you get with distributed state. Imagine a file system that is split up in different tasks, instead of one monolithic blob. Now, one task makes a change, like removing a file. Before the other tasks can make changes to the filesystem, they first need to synchronize to get the latest state. This becomes either terribly inefficient, or a huge mess, and most likely both at the same time.
Minix, for example, solved this particular problem by letting the entire filesystem be supported by a single task, but results in terrible scaling performance when adding more users/processes.
I don't really know why.
Users will say "But it works, we don't want to change waaagh scary" while simultaneously reporting 237 bugs all of which are OMG critical. Management will assume that it's cheaper, because existing stuff is already there so it's wasteful not to use it.
Now it's true that once a load of crufty business rules have built up with 17 levels of nested conditionals it can be risky to try and replicate it for fear of missing some obscure case that's bound to occur at an inconvenient time for a key customer. There's no documentation, of course. Or if there is it's the source code, six revisions behind, pasted into a word document with three screenshots taken as BMPs so the whole thing is 1.5G. This alone can make you say "sod it".
I can't find the correct phrase but maybe it's just a false analogy with physical things. Like reusing wood from an old shed to build a deck possibly is cheaper.
Confucius say, "Find worm in apple - bad. Find half a worm - worse."
You're operating on the mistaken assumption that code that works now will always work and never need to be modified. You can't leave anything but the simplest things alone forever, because changes to the context/world will eventually require changes to it. If it's spaghetti code, that's going to be causing future bugs that are going to be non-obvious and difficult to discover.
This space intentionally left blank
Worst of all is when they embark on a rewrite and give up half way through. I was involved in a project to port a C++/ActiveX based system to .NET forms. They ported most of the major views but left a lot of the minor stuff from the old codebase lying around and wrote bridges to host it in the new framework. So they doubled the code, half of it became bitrotten and hidden by the new code and bloated out the runtime. Great project.
I am "in Software" since ~25 years. I also hold a degree as a Software Engineer.
People who obsess about rewriting old code just because it's old tend to forget that in that old code are many bug fixes for edge cases found over the years. It was well documented and part of my education to know and understand that rewriting often caused those same bugs to surface again.
Best practice is to run both the old and new software in tandem for a while and verify the results. In reality no organization besides NASA will do that.
it's in my head
I have a degree in Computer Science, and I'm in software since 1998. If you do a proper refactoring, at the end of the day, you'll get a much better code, probably better performance, and now that you have more background in the subject, a smaller code. If you're using a code repository, so you'll never lose anything. And if you have a bug regression always coming back, you need a proper test/spec to cover that. So, refactor is really good when: you have a way improved background in the subject, code repository (i.e., history), and tests to cover the recurrent bugs and the main features.
... thus "very few organizations besides NASA".
A lot of people seem to miss the point on how the ideal lab condition doesn't carry over into real world organizations.
it's in my head
I find your signature incredibly relevant to your post.
For large sets, this will be our guide even unto death, for the LORD will work for each type of data it is applied to...
I don't really know why. Users will say "But it works, we don't want to change waaagh scary" while simultaneously reporting 237 bugs all of which are OMG critical.
Because if you did it wrong the first time, there's no chance that you're going to do it better the second time. You'll end up leaving out crucial functionality or something.
If you don't know how to clean up a codebase in-place by rewriting a little at a time, then you aren't skilled enough to do a rewrite from scratch.
"First they came for the slanderers and i said nothing."