Linux Getting Extensive x86 Assembly Code Refresh

← Back to Stories (view on slashdot.org)

Linux Getting Extensive x86 Assembly Code Refresh

Posted by Soulskill on Monday April 13, 2015 @11:31AM from the code-run-through-de-spaghettifier dept.

jones_supa writes: A massive x86 assembly code spring cleaning has been done in a pull request that is to end up in Linux 4.1. The developers have tried testing the code on many different x86 boxes, but there's risk of regression when exposing the code to many more systems in the days and weeks ahead. That being said, the list of improvements is excellent. There are over 100 separate cleanups, restructuring changes, speedups and fixes in the x86 system call, IRQ, trap and other entry code, part of a heroic effort to deobfuscate a decade old spaghetti assembly code and its C code dependencies.

9 of 209 comments (clear)

Min score:

Reason:

Sort:

Re:Should be micro kernel by caseih · 2015-04-13 12:05 · Score: 3, Interesting

Just because Minux has only 100 lines of assembly doesn't mean anything about Darwin, even if Darwin has microkernel components, so your association there is a bit fallacious. Unless it's changed recently, Darwin does have microkernel (mach in fact) underpinnings, but a complete FreeBSD subsystem is grafted onto that. So if anything Darwin is a hybrid kernel. Like most real systems out there, it's not a complete microkernel system.
Re:Should be micro kernel by OzPeter · 2015-04-13 12:13 · Score: 3, Interesting

I've never seen a true microkernel that has the performance of a monolithic kernel. Nobody wants to buy a new computer and drag it down to a craw
Did you ever use OS-9 from Microware? (not to be confused with OS 9 from Apple)
Back in the day I ran OS-9 on a Tandy Co-Co and had a fully multi-user, pre-emptive multitasking system* running on a 6809E, 8 bit, sub 2MHz CPU. Later on I worked with a variety of industrial computers running OS-9 on 68K based systems and they worked just fine.
* I will give you that I only ever fired up the graphical desktop all of once just to see if it worked. After that I stayed in the command line.

--
I am Slashdot. Are you Slashdot as well?
If It Ain't Broke, Don't Fix It! by Anonymous Coward · 2015-04-13 13:26 · Score: 1, Interesting

Why modify tested working code? What is this other than an excellent opportunity to inject malware into multiple Linux distros?
Re:Cruft by Anonymous Coward · 2015-04-13 13:32 · Score: 3, Interesting

Proof? I can't find such posts in Mark's blog.
Re:Should be micro kernel by m.dillon · 2015-04-13 15:17 · Score: 5, Interesting

Nobody does message passing for basic operations. I actually tried to asynchronize DragonFly's system calls once but it was a disaster. Too much overhead.
On a modern Intel cpu a system call runs around 60nS. If you add a message-passing layer with an optimized path to avoid thread switching that will increase to around 200-300ns. If you actually have to switch threads it increases to around 1.2uS. If you actually have to switch threads AND save/restore the FPU state now you are talking about ~2-3uS. If you have to message pass across cpus then the IPI overhead can be significant... several microseconds just for that, plus cache mastership changes.
And all of those times assume shared memory for the message contents. They're strictly the switch and management overhead.
So, basically, no operating system that is intended to run efficiently can use message-passing for basic operations. Message-passing can only be used in two situations:
(1) When you have to switch threads anyway. That is, if two processes or two threads are messaging each other. Another good example is when you schedule an interrupt thread but cannot immediately switch to it (preempt current thread). If the current thread cannot be preempted then the interrupt thread can be scheduled normally without imposing too much overhead vs the alternative.
(2) When the operation can be batched. In DragonFly we successfully use message-passing for network packets and attain very significant cpu localization benefits from it. It works because packets are batched on fast interfaces anyway. By retaining the batching all the way through the protocol stack we can effectively use message passing and spread the overhead across many packets. The improvement we get from cpu localization, particularly not having to acquire or release locks in the protocol paths, then trumps the messaging overhead.
#2 also works well for data processing pipelines.
-Matt
Re:Should be micro kernel by 0123456 · 2015-04-13 15:28 · Score: 3, Interesting

You'd presumably need to add new CPU functionality to allow fast context switches. If I remember correctly, a 20MHz Transputer took about one microsecond to switch threads, because that was one of the primary design goals. Of course, that lead to them building a stack-based CPU where almost nothing had to be saved on a context switch...
Re:It's only a modest refresh by Anonymous Coward · 2015-04-13 16:58 · Score: 5, Interesting

It is a modern thing, primarily for two reasons:
(1) As you mentioned, optimization trumped cleanliness. It's not just that a given coder couldn't be bothered wasting his time writing it cleaner: it's that often the choice was between a guy who can write clean code and a guy who can write very messy but highly-optimal code, and the latter would win in the marketplaces (the software, hardware, and programming-job ones). Writing optimal assembly code and organizing a modern large clean codebase in a HLL don't have a ton of skill overlap, all things considered.
(2) As you rewind the clock on programming history, keep in mind that further back there were simply far fewer total programmers in existence, and far fewer working on any given codebase, by orders of magnitude. When you look back far enough, you see major companies launching major projects with a total programming staff of like 1-3 guys. Most of the code ever written in the older decades was the result of heroic one-man efforts. Why bother optimizing for others being able to read your code where there's unlikely to be many of them, and they're all likely to be crazy like you anyways?
Re:Speedups? by jones_supa · 2015-04-13 17:38 · Score: 5, Interesting

There's also a cool tool called CLOC which gives a nice report about a source tree including the lines of code and in which languages they are written.
Re:Can we be sure there are no exploits? by TheRaven64 · 2015-04-13 23:13 · Score: 3, Interesting

The first instruction in the Intel Architecture Reference (Part 2: Instruction Set Reference) is AAA, which is named after the noise made by people forced to read x86 assembly. It is short for 'ASCII Adjust After Addition' (yes, that should be AAAA, but that would be too consistent for Intel). This instruction exists to convert the result of binary addition into the result of the corresponding BCD addition.
Or, to put it in simpler terms: Anyone who thinks x86 assembly is not that difficult to understand is certifiably batshit insane.

--
I am TheRaven on Soylent News