Slashdot Mirror


Linux Getting Extensive x86 Assembly Code Refresh

jones_supa writes: A massive x86 assembly code spring cleaning has been done in a pull request that is to end up in Linux 4.1. The developers have tried testing the code on many different x86 boxes, but there's risk of regression when exposing the code to many more systems in the days and weeks ahead. That being said, the list of improvements is excellent. There are over 100 separate cleanups, restructuring changes, speedups and fixes in the x86 system call, IRQ, trap and other entry code, part of a heroic effort to deobfuscate a decade old spaghetti assembly code and its C code dependencies.

27 of 209 comments (clear)

  1. Debt by Ducho_CWB · · Score: 5, Insightful

    Technical Debt haunts you.

    1. Re:Debt by Darinbob · · Score: 4, Insightful

      Yes, if it weren't for the idea that I could change jobs if I needed to, I'd have been full of hopless dread at just about most places I've worked. The sad thing is, in some places the majority of technical debt is creating in the first year of the company's existence, during the hurry-up-already startup phase.

  2. Re:Can we be sure there are no exploits? by Lunix+Nutcase · · Score: 5, Insightful

    The only way to truly understand C code is to read the disassembly. Otherwise you are only assuming what the compiler is emitting.

  3. Re:obfuscation mk2 by ThePhilips · · Score: 3, Funny

    Not replaced, you dummy.

    Elevated to a new level.

    --
    All hope abandon ye who enter here.
  4. Re:Should be micro kernel by OzPeter · · Score: 4, Funny

    Sometimes theory doesn't live up to reality.

    Yes, I've hurd that before.

    --
    I am Slashdot. Are you Slashdot as well?
  5. Re:Should be micro kernel by caseih · · Score: 3, Interesting

    Just because Minux has only 100 lines of assembly doesn't mean anything about Darwin, even if Darwin has microkernel components, so your association there is a bit fallacious. Unless it's changed recently, Darwin does have microkernel (mach in fact) underpinnings, but a complete FreeBSD subsystem is grafted onto that. So if anything Darwin is a hybrid kernel. Like most real systems out there, it's not a complete microkernel system.

  6. Re:Should be micro kernel by OzPeter · · Score: 3, Interesting

    I've never seen a true microkernel that has the performance of a monolithic kernel. Nobody wants to buy a new computer and drag it down to a craw

    Did you ever use OS-9 from Microware? (not to be confused with OS 9 from Apple)

    Back in the day I ran OS-9 on a Tandy Co-Co and had a fully multi-user, pre-emptive multitasking system* running on a 6809E, 8 bit, sub 2MHz CPU. Later on I worked with a variety of industrial computers running OS-9 on 68K based systems and they worked just fine.

    * I will give you that I only ever fired up the graphical desktop all of once just to see if it worked. After that I stayed in the command line.

    --
    I am Slashdot. Are you Slashdot as well?
  7. Re:Cruft by Anonymous Coward · · Score: 3, Interesting

    Proof? I can't find such posts in Mark's blog.

  8. Re:If It Ain't Broke, Don't Fix It! by Anonymous Coward · · Score: 3, Insightful

    if they want to audit those stale old things for correctness and performance, and make them more readable for future generations, and
    do the testing and review to make sure they haven't fucked them up

    then good for them. i mean really good for them.

    any code - especially the kernel - isn't a concrete artifact, its a process. an organism.

    heathy organisms eliminate their wastes

    and a 2% performance bump in system call overhead isn't anything to sneeze at

  9. Re:Speedups? by Guy+Harris · · Score: 3, Informative

    > There are over 100 separate ... speedups

    The last time I looked, which was quite a few years ago TBH, the BSDs have, IIRC, less than 100 lines of x86 assembly, in the bootstrap.

    From relatively-recent FreeBSD:

    $ find sys/i386 -name '*.[sS]' -print | xargs wc -l
    208 sys/i386/acpica/acpi_wakecode.S
    40 sys/i386/bios/smapi_bios.S
    396 sys/i386/i386/apic_vector.s
    78 sys/i386/i386/atpic_vector.s
    160 sys/i386/i386/bioscall.s
    470 sys/i386/i386/exception.s
    900 sys/i386/i386/locore.s
    279 sys/i386/i386/mpboot.s
    831 sys/i386/i386/support.s
    538 sys/i386/i386/swtch.s
    179 sys/i386/i386/vm86bios.s
    37 sys/i386/linux/linux_locore.s
    127 sys/i386/linux/linux_support.s
    32 sys/i386/svr4/svr4_locore.s
    202 sys/i386/xbox/pic16l.s
    494 sys/i386/xen/exception.s
    361 sys/i386/xen/locore.s
    5332 total
    $ find sys/amd64 -name '*.[sS]' -print | xargs wc -l
    282 sys/amd64/acpica/acpi_wakecode.S
    326 sys/amd64/amd64/apic_vector.S
    73 sys/amd64/amd64/atpic_vector.S
    541 sys/amd64/amd64/cpu_switch.S
    906 sys/amd64/amd64/exception.S
    88 sys/amd64/amd64/locore.S
    236 sys/amd64/amd64/mpboot.S
    56 sys/amd64/amd64/sigtramp.S
    732 sys/amd64/amd64/support.S
    75 sys/amd64/ia32/ia32_exception.S
    161 sys/amd64/ia32/ia32_sigtramp.S
    38 sys/amd64/linux32/linux32_locore.s
    124 sys/amd64/linux32/linux32_support.s
    246 sys/amd64/vmm/intel/vmx_support.S
    42 sys/amd64/vmm/vmm_support.S
    3926 total

    It's about 45,000 lines in Linux 3.19's arch/x86. A fair bit of that is crypto code, presumably either generally hand-optimized or using various new instructions to do various crypto calculations.

  10. Re:opportunity for backdoors? by EzInKy · · Score: 3, Insightful

    Who can sight read assembly anymore?

    Everybody who is interested in "How Things Work" can read assemblly code. Those who depend on hopes and prayers do not.

    --
    Time is what keeps everything from happening all at once.
  11. It's only a modest refresh by m.dillon · · Score: 4, Insightful

    It's not a major refresh, only a modest one, and it doesn't really fix the readability issues (which would require a complete rewrite). Linux assembly is a mostly unreadable, badly formatted, macro-happy mess. The assembly in the BSDs is much more elegant and minimalistic.

    -Matt

    1. Re:It's only a modest refresh by Anonymous Coward · · Score: 5, Interesting

      It is a modern thing, primarily for two reasons:

      (1) As you mentioned, optimization trumped cleanliness. It's not just that a given coder couldn't be bothered wasting his time writing it cleaner: it's that often the choice was between a guy who can write clean code and a guy who can write very messy but highly-optimal code, and the latter would win in the marketplaces (the software, hardware, and programming-job ones). Writing optimal assembly code and organizing a modern large clean codebase in a HLL don't have a ton of skill overlap, all things considered.

      (2) As you rewind the clock on programming history, keep in mind that further back there were simply far fewer total programmers in existence, and far fewer working on any given codebase, by orders of magnitude. When you look back far enough, you see major companies launching major projects with a total programming staff of like 1-3 guys. Most of the code ever written in the older decades was the result of heroic one-man efforts. Why bother optimizing for others being able to read your code where there's unlikely to be many of them, and they're all likely to be crazy like you anyways?

    2. Re:It's only a modest refresh by Steve+Hamlin · · Score: 3, Informative

      +5 Informative / Insightful for parent

      For background, the parent comment is Matthew Dillon, compiler and kernel hacker on Amiga/BSD/Linux and founder of DragonFly BSD (fork of FreeBSD).

      http://en.wikipedia.org/wiki/D...

  12. Re:Should be micro kernel by m.dillon · · Score: 5, Interesting

    Nobody does message passing for basic operations. I actually tried to asynchronize DragonFly's system calls once but it was a disaster. Too much overhead.

    On a modern Intel cpu a system call runs around 60nS. If you add a message-passing layer with an optimized path to avoid thread switching that will increase to around 200-300ns. If you actually have to switch threads it increases to around 1.2uS. If you actually have to switch threads AND save/restore the FPU state now you are talking about ~2-3uS. If you have to message pass across cpus then the IPI overhead can be significant... several microseconds just for that, plus cache mastership changes.

    And all of those times assume shared memory for the message contents. They're strictly the switch and management overhead.

    So, basically, no operating system that is intended to run efficiently can use message-passing for basic operations. Message-passing can only be used in two situations:

    (1) When you have to switch threads anyway. That is, if two processes or two threads are messaging each other. Another good example is when you schedule an interrupt thread but cannot immediately switch to it (preempt current thread). If the current thread cannot be preempted then the interrupt thread can be scheduled normally without imposing too much overhead vs the alternative.

    (2) When the operation can be batched. In DragonFly we successfully use message-passing for network packets and attain very significant cpu localization benefits from it. It works because packets are batched on fast interfaces anyway. By retaining the batching all the way through the protocol stack we can effectively use message passing and spread the overhead across many packets. The improvement we get from cpu localization, particularly not having to acquire or release locks in the protocol paths, then trumps the messaging overhead.

    #2 also works well for data processing pipelines.

    -Matt

  13. Re:Should be micro kernel by 0123456 · · Score: 3, Interesting

    You'd presumably need to add new CPU functionality to allow fast context switches. If I remember correctly, a 20MHz Transputer took about one microsecond to switch threads, because that was one of the primary design goals. Of course, that lead to them building a stack-based CPU where almost nothing had to be saved on a context switch...

  14. Re:Can we be sure there are no exploits? by Rigel47 · · Score: 3, Insightful

    I do enjoy how we tech people less-than-quietly display our scorn of all the varying fields in IT that aren't our own. Because, hah, stupid Javascript developer, couldn't explain northbridge from south! Or what DMA is! He probably doesn't know a single assembly instruction! Clearly an inferior being.

    Javascript guy meanwhile regards the C guy as a primitive, bearded man from the hills who labors all day on some stupid library that is ten layers of stack between that mortar and pestle and the awesome browser-art he's creating.

    Systems administrators wish everyone would run off and die because they are all irritating, stupid whiners.

    DBAs are just smug because nobody else understands their schemas and, hey, this is where it all happens.

    Networking would rather be back below the hold, No, the network isn't broken, your app is buggy or the stupid website you're trying to load is just slow..

    Help desk guys meanwhile consider themselves the cocks of the walk because, generally, their camaraderie and opportunity to interact with more regular people means their souls haven't been totally crushed.

  15. Re:Should be micro kernel by Guy+Harris · · Score: 4, Insightful

    I'm sure you're right, though they have something to do with micokernels. There was Linus interview from a few years back explaining his preference for the monolithic approach, and he explained that modules were introduced to give most of the benefits of the microkernel, without the drawbacks.

    I'd have to see that interview to believe that's exactly what he said. In this essay by him, he says

    With the 2.0 kernel Linux really grew up a lot. This was the point that we added loadable kernel modules. This obviously improved modularity by making an explicit structure for writing modules. Programmers could work on different modules without risk of interference. I could keep control over what was written into the kernel proper. So once again managing people and managing code led to the same design decision. To keep the number of people working on Linux coordinated, we needed something like kernel modules. But from a design point of view, it was also the right thing to do.

    but doesn't at all tie that to microkernels.

    Loadable kernel modules in UN*Xes date back at least to SunOS 4.1.3 and AIX 3.0 in the early 1990's. I'm not sure they were introduced to compete with microkernels.

  16. So that is how it happens by PenguinJeff · · Score: 4, Funny

    The improved assembly code is what allows the Terminator to be so efficient a killing machine.

  17. Re:Cruft by benjymouse · · Score: 4, Informative

    For some time now, Mark Russinovich at Microsoft has been talking about just how bad the Windows kernel was/is in his blog.

    I think you are confused. It was not Mark Russinovich, but rather Linus Torvalds, and he was talking about the Linux kernel, not the Windows kernel.

    "I mean, sometimes it's a bit sad that we are definitely not the streamlined, small, hyper-efficient kernel that I envisioned 15 years ago...The kernel is huge and bloated, and our icache footprint is scary. I mean, there is no question about that. And whenever we add a new feature, it only gets worse."

    Glad I could help.

    --
    Reading slashdot one-liner: (irm http://rss.slashdot.org/Slashdot/slashdot).rdf.item | fl title,desc*
  18. Re:Speedups? by jones_supa · · Score: 5, Interesting

    There's also a cool tool called CLOC which gives a nice report about a source tree including the lines of code and in which languages they are written.

  19. Re:Wouldn't a re-write be more fruitful? by Hognoxious · · Score: 4, Insightful

    I don't really know why.

    Users will say "But it works, we don't want to change waaagh scary" while simultaneously reporting 237 bugs all of which are OMG critical. Management will assume that it's cheaper, because existing stuff is already there so it's wasteful not to use it.

    Now it's true that once a load of crufty business rules have built up with 17 levels of nested conditionals it can be risky to try and replicate it for fear of missing some obscure case that's bound to occur at an inconvenient time for a key customer. There's no documentation, of course. Or if there is it's the source code, six revisions behind, pasted into a word document with three screenshots taken as BMPs so the whole thing is 1.5G. This alone can make you say "sod it".

    I can't find the correct phrase but maybe it's just a false analogy with physical things. Like reusing wood from an old shed to build a deck possibly is cheaper.

    --
    Confucius say, "Find worm in apple - bad. Find half a worm - worse."
  20. Re:Can we be sure there are no exploits? by Anonymous Coward · · Score: 3, Funny

    Jokes on you. I'm building a VM language in ASM.js and VM bytecode. I get to deal with memory management and multiprocessing issues that are unique to ASM.js. Since the code compiles to opcode which can then be output as "native" ASM.js (which is then mostly compiled to machine code by browsers) or it can run blazing fast on the C VM implementation with its (native) JiT optimizations. Since the platform is meant to be an extremely portable high level and systems level language I'm currently implementing the VM and compiler in its own bytecode, whith the goal of also outputting as x86, ARM, ASM.js, JavaVM bytecode, and Perl6 bytecode.

    That means I have to deal with OS level things like spin locks, memory paging, etc. However, it also gives me the chance to radically change the security model such that all code is sandboxed, ran in a VM as bytecode and only compiled and cahed as "native" to the machine / platform / browser if trusted.

    Networking abstraction happens to be entire point of the project. Instead of everything being a file, as in UNIX, on my system everything is a network connection. Even Local "native" connections can be seamlessly wrapped as TCP or UDP connections at the OS level, so that IPC is transparent and computing can become distributed without modifying the code. Yes, that means your business can harness the CPU of its visitors to scale availability along with traffic.

    Silly single platform devs, still not harnessing every tier of development like plebians. The future is distributed, and you'll get on my level eventually.

  21. Re:Can we be sure there are no exploits? by Anonymous Coward · · Score: 3, Funny

    That was the most single most hipster post I've read on /. all year.

  22. Re:If It Ain't Broke, Don't Fix It! by Gavagai80 · · Score: 3, Insightful

    You're operating on the mistaken assumption that code that works now will always work and never need to be modified. You can't leave anything but the simplest things alone forever, because changes to the context/world will eventually require changes to it. If it's spaghetti code, that's going to be causing future bugs that are going to be non-obvious and difficult to discover.

    --
    This space intentionally left blank
  23. Re:If It Ain't Broke, Don't Fix It! by Troed · · Score: 3, Insightful

    I am "in Software" since ~25 years. I also hold a degree as a Software Engineer.

    People who obsess about rewriting old code just because it's old tend to forget that in that old code are many bug fixes for edge cases found over the years. It was well documented and part of my education to know and understand that rewriting often caused those same bugs to surface again.

    Best practice is to run both the old and new software in tandem for a while and verify the results. In reality no organization besides NASA will do that.

  24. Re:Can we be sure there are no exploits? by TheRaven64 · · Score: 3, Interesting

    The first instruction in the Intel Architecture Reference (Part 2: Instruction Set Reference) is AAA, which is named after the noise made by people forced to read x86 assembly. It is short for 'ASCII Adjust After Addition' (yes, that should be AAAA, but that would be too consistent for Intel). This instruction exists to convert the result of binary addition into the result of the corresponding BCD addition.

    Or, to put it in simpler terms: Anyone who thinks x86 assembly is not that difficult to understand is certifiably batshit insane.

    --
    I am TheRaven on Soylent News