Linux Getting Extensive x86 Assembly Code Refresh
jones_supa writes: A massive x86 assembly code spring cleaning has been done in a pull request that is to end up in Linux 4.1. The developers have tried testing the code on many different x86 boxes, but there's risk of regression when exposing the code to many more systems in the days and weeks ahead. That being said, the list of improvements is excellent. There are over 100 separate cleanups, restructuring changes, speedups and fixes in the x86 system call, IRQ, trap and other entry code, part of a heroic effort to deobfuscate a decade old spaghetti assembly code and its C code dependencies.
Technical Debt haunts you.
I wonder how much cruft there is in Windows or OSX.
We live in interesting times these days. With a changeset so big, and involving assembly code that isn't as easy to understand as C code, how can we really be sure that no exploits have been introduced? How extensively have these changes been reviewed to ensure there are no exploits or potential exploits being sneaked in?
I have a feeling that this is going to mean a *big* hump for everyone who uses real-time Linux. It could be months before we're able to upgrade. Adeos tinkers with that code, and Xenomai works on Adeos patched kernels.
What does the presence of assembly code have at all to do with being a microkernel? Microkernels can be written entirely in assembly and have been.
XNU contains thousands upon thousands of lines of assembly code for both x86 and ARM. Being a microkernel does not preclude the presence of assembly code
Linux has been obsolete since introducing assembly code. Minux touches the hardware in just 100 lines of code and macosx is a micro kernel as well.
It is 2015 and not 1985
Better not use Minix then, because it also has assembly.
"Set a man a fire, he'll be warm for the rest of the night. Set a man afire, he'll be warm for the rest of his life."
All C code ends up as assembler, anyway. It just gets deleted, most of the time, before you see it.
"To those who are overly cautious, everything is impossible. "
I've never seen a true microkernel that has the performance of a monolithic kernel. Nobody wants to buy a new computer and drag it down to a crawl. The people who want microkernel features are getting them with no issue. What's the point? It's not like my system has dumped core and full-on bombed out since about 2006. It's not like I have problems finding the features that are supported by microkernel OSs. It's not even like most "microkernel" operating systems perform all of their system calls through pipe or socket interfaces in reality. So, what's the point?
AC is correct. Great in theory, but in *reality* some programmer just bites the bullet and makes this stuff all solid in a monolithic kernel, and the result is a faster OS. I'll take that practice over the microkernel theories any day.
OS-X is not a microkernel - has never been. Even when NEXTSTEP was based on Mach 3.0, that too was not microkernel. Apple doesn't use any of the rules of microkernels - user space drivers, et al - in OS-X or iOS
Not replaced, you dummy.
Elevated to a new level.
All hope abandon ye who enter here.
Sometimes theory doesn't live up to reality.
Yes, I've hurd that before.
I am Slashdot. Are you Slashdot as well?
Just because Minux has only 100 lines of assembly doesn't mean anything about Darwin, even if Darwin has microkernel components, so your association there is a bit fallacious. Unless it's changed recently, Darwin does have microkernel (mach in fact) underpinnings, but a complete FreeBSD subsystem is grafted onto that. So if anything Darwin is a hybrid kernel. Like most real systems out there, it's not a complete microkernel system.
r u redi to rumble?
I've never seen a true microkernel that has the performance of a monolithic kernel. Nobody wants to buy a new computer and drag it down to a craw
Did you ever use OS-9 from Microware? (not to be confused with OS 9 from Apple)
Back in the day I ran OS-9 on a Tandy Co-Co and had a fully multi-user, pre-emptive multitasking system* running on a 6809E, 8 bit, sub 2MHz CPU. Later on I worked with a variety of industrial computers running OS-9 on 68K based systems and they worked just fine.
* I will give you that I only ever fired up the graphical desktop all of once just to see if it worked. After that I stayed in the command line.
I am Slashdot. Are you Slashdot as well?
Yep, XNU is a hybrid kernel. It has microkernel parts and monolithic kernel parts.
To add your claim that XNU does not follow any microkernel rules is simply false. XNU uses microkernel-style message passing.
> There's a risk of regression when exposing the code to many more systems
The risk of regression is due to refactoring, not due to testing. Ironic, given that the post cites de-obfuscation as a reason for doing this. Or perhaps our submitter just got an MBA and is learning to think and speak in management-ese.
Micro kernel is not necessarily better. There was a period of time when everyone felt that way, but the fad wore off quickly when no one managed to actually create a micro kernel that ran rings around the competition. Yes, micro kernel is great for adaptability and debugging but in practice the actual consumers don't care about those features.
It's like software layers. You'll see some groups that are utterly adamant about keeping things strictly in layers, yet there are often very noticeable barriers between the layers that are inefficient both in run time and in developer productivity. For network stacks, I don't think anyone ever creates a pure layered architecture as envisioned by the OSI model.
To add your claim that XNU does not follow any microkernel rules is simply false. XNU uses microkernel-style message passing.
XNU has system calls to allow messages to be sent between processes, including sending large amounts of data by page flipping.
It just doesn't happen to use that to implement very much of the UNIX API; it's not used to implement file I/O (that goes through ordinary BSD system calls to in-kernel file systems that are called through a BSD-style VFS) or network I/O (that goes through ordinary BSD system calls to in-kernel networking stacks that are called through a BSD-style kernel socket layer) or much of the process/thread management or VM code (that goes through ordinary system calls that end up calling Mach task, thread, and VM management calls).
It is used for communication between user processes, and for some kernel user communication, but that's the same sort of use that happens in systems with Boring Old Monolithic Kernels.
OS-X is not a microkernel - has never been. Even when NEXTSTEP was based on Mach 3.0, that too was not microkernel. Apple doesn't use any of the rules of microkernels - user space drivers, et al - in OS-X or iOS
As I understand it, NeXT / OSX started with a micro-kernel philosophy and then introduced some monolithic kernel concepts to address the performance bottleneck of messaging between true micro modules.
Meanwhile Linux starts as a monolithic kernel, but introduced (un)loadable modules to address maintainability and extendability.
So if we described it as a continuum with 'pure microkernel' being a '1' and pure monolithic kernel being a '10', then OSX would be something like a '3' and Linux would be a '7'.
If it acquires resources on instantiation like a duck, then its a shared_ptr<Duck>
Why modify tested working code? What is this other than an excellent opportunity to inject malware into multiple Linux distros?
The chicklet keyboard was horrible. There were a few 3rd party replacement keyboards that worked pretty well.
Zoid.com
As I understand it, NeXT / OSX started with a micro-kernel philosophy and then introduced some monolithic kernel concepts to address the performance bottleneck of messaging between true micro modules.
Meanwhile Linux starts as a monolithic kernel, but introduced (un)loadable modules to address maintainability and extendability.
So if we described it as a continuum with 'pure microkernel' being a '1' and pure monolithic kernel being a '10', then OSX would be something like a '3' and Linux would be a '7'.
Loadable kernel modules have nothing to do with microkernels. A truly micro microkernel wouldn't need loadable kernel modules because all the loadable functionality would run in userland; plenty of monolithic kernels have loadable kernel modules.
And OS X is a lot further from "pure microkernel" than 3. The "monolithic kernel concepts" include "running the networking stack in the kernel, standard BSD-style", "running the file systems in the kernel, standard BSD-style", and "implementing most process and VM management operations with standard system calls in the kernel".
> There are over 100 separate ... speedups
The last time I looked, which was quite a few years ago TBH, the BSDs have, IIRC, less than 100 lines of x86 assembly, in the bootstrap.
From relatively-recent FreeBSD:
It's about 45,000 lines in Linux 3.19's arch/x86. A fair bit of that is crypto code, presumably either generally hand-optimized or using various new instructions to do various crypto calculations.
Who can sight read assembly anymore?
Everybody who is interested in "How Things Work" can read assemblly code. Those who depend on hopes and prayers do not.
Time is what keeps everything from happening all at once.
Who can sight read assembly anymore?
Anyone who reasonably calls themselves a programmer.
"First they came for the slanderers and i said nothing."
If you consider that the C compiler itself uses assembly to make the basic operations work in the libraries, and that all C code is built on assembly libraries, then it makes the whole argument kind of silly, doesn't it?
ALL the kernel code is assembly on Linux and BSD, some of it is just raw assembly, and other bits of it are assembly encoded in "C".
The couple in that most languages and VMs are written in C ... as well as all the libraries that these things depend on to actually get something else done ... well then pretty much everything is reduced to assembly ...
Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
That depends. Are you more interested in how something happens, or are you more interested in preventing something from happening? I think that is the factor that differeniates technicians from scientists.
Time is what keeps everything from happening all at once.
According to their own site, the kernel is not a microkernel.
http://www.microware.com/index...
It's not a major refresh, only a modest one, and it doesn't really fix the readability issues (which would require a complete rewrite). Linux assembly is a mostly unreadable, badly formatted, macro-happy mess. The assembly in the BSDs is much more elegant and minimalistic.
-Matt
...do you know what the machine is doing with the codes that you type? Abstractions neccessarily lead to assumptions.
Time is what keeps everything from happening all at once.
Ingo Molnár has interesting comments on distributions, basically he wants them more modular.
Nobody does message passing for basic operations. I actually tried to asynchronize DragonFly's system calls once but it was a disaster. Too much overhead.
On a modern Intel cpu a system call runs around 60nS. If you add a message-passing layer with an optimized path to avoid thread switching that will increase to around 200-300ns. If you actually have to switch threads it increases to around 1.2uS. If you actually have to switch threads AND save/restore the FPU state now you are talking about ~2-3uS. If you have to message pass across cpus then the IPI overhead can be significant... several microseconds just for that, plus cache mastership changes.
And all of those times assume shared memory for the message contents. They're strictly the switch and management overhead.
So, basically, no operating system that is intended to run efficiently can use message-passing for basic operations. Message-passing can only be used in two situations:
(1) When you have to switch threads anyway. That is, if two processes or two threads are messaging each other. Another good example is when you schedule an interrupt thread but cannot immediately switch to it (preempt current thread). If the current thread cannot be preempted then the interrupt thread can be scheduled normally without imposing too much overhead vs the alternative.
(2) When the operation can be batched. In DragonFly we successfully use message-passing for network packets and attain very significant cpu localization benefits from it. It works because packets are batched on fast interfaces anyway. By retaining the batching all the way through the protocol stack we can effectively use message passing and spread the overhead across many packets. The improvement we get from cpu localization, particularly not having to acquire or release locks in the protocol paths, then trumps the messaging overhead.
#2 also works well for data processing pipelines.
-Matt
OS-X is not a microkernel - has never been. Even when NEXTSTEP was based on Mach 3.0, that too was not microkernel. Apple doesn't use any of the rules of microkernels - user space drivers, et al - in OS-X or iOS
Screw you Mac OS X guys with your bloated microkernel! I use classic Mac OS with a nanokernel .
Captcha: miserly
part of a heroic effort to deobfuscate a decade old spaghetti assembly code
Which is why the 4.1 release should be named as "Tomato Source."
I implemented Transputer drivers for OS9 running on a 68k in a VME rack back in the day.
It was a clean, simple and well designed OS.
I should use this sig to advertise my book ISBN-13 : 978-1501515132.
You'd presumably need to add new CPU functionality to allow fast context switches. If I remember correctly, a 20MHz Transputer took about one microsecond to switch threads, because that was one of the primary design goals. Of course, that lead to them building a stack-based CPU where almost nothing had to be saved on a context switch...
It was a monolithic kernel. One of the interesting features were devices drivers were modules and there was a small device node module which would say stuff like "used module 'serial driver', call it tty4 at IRQ 2 and address 0x454040". The kernel would deal with all IRQs in the hardware and then run the IRQ callback funtion in the proper module. That allowed user level device drivers back in the early 1980s.
Another cool feature was each software module had a CRC so it could detect bad binaries. There were ways to whitelist and blacklist based on CRC values.
Loadable kernel modules have nothing to do with microkernels. A truly micro microkernel wouldn't need loadable kernel modules because all the loadable functionality would run in userland; plenty of monolithic kernels have loadable kernel modules.
I'm sure you're right, though they have something to do with micokernels. There was Linus interview from a few years back explaining his preference for the monolithic approach, and he explained that modules were introduced to give most of the benefits of the microkernel, without the drawbacks.
If it acquires resources on instantiation like a duck, then its a shared_ptr<Duck>
I'm sure you're right, though they have something to do with micokernels. There was Linus interview from a few years back explaining his preference for the monolithic approach, and he explained that modules were introduced to give most of the benefits of the microkernel, without the drawbacks.
I'd have to see that interview to believe that's exactly what he said. In this essay by him, he says
but doesn't at all tie that to microkernels.
Loadable kernel modules in UN*Xes date back at least to SunOS 4.1.3 and AIX 3.0 in the early 1990's. I'm not sure they were introduced to compete with microkernels.
The improved assembly code is what allows the Terminator to be so efficient a killing machine.
I can't find the article now. It was years ago. Perhaps I misunderstood it. But I think it meant something like:
If it acquires resources on instantiation like a duck, then its a shared_ptr<Duck>
I can't find the article now. It was years ago. Perhaps I misunderstood it. But I think it meant something like:
That's more like "mechanisms X or Y both allow Z to be accomplished"; the only thing that says X and Y have to do with one another is that they both allow Z to be accomplished, which isn't that much.
though they have something to do with micokernels
Which isn't that much.
Great, can we agree now that not much is something and not nothing?
In other news Thylacines and Jackals have nothing to do with each other, except they both look like canids and fill similar ecological niches. Apples and oranges . . .
If it acquires resources on instantiation like a duck, then its a shared_ptr<Duck>
There's also a cool tool called CLOC which gives a nice report about a source tree including the lines of code and in which languages they are written.
Micro kernel is not necessarily better.
No, micro kernels are plain worse. The biggest problem with micro kernels is the synchronisation problem you get with distributed state. Imagine a file system that is split up in different tasks, instead of one monolithic blob. Now, one task makes a change, like removing a file. Before the other tasks can make changes to the filesystem, they first need to synchronize to get the latest state. This becomes either terribly inefficient, or a huge mess, and most likely both at the same time.
Minix, for example, solved this particular problem by letting the entire filesystem be supported by a single task, but results in terrible scaling performance when adding more users/processes.
>> ALL the kernel code is assembly on Linux and BSD, some of it is just raw assembly,
Yes, but the compiler is a few orders of magnitude more reliable than a human when generating assembly
aaaaaaa
Ah the days when umacs was the editor of choice, a large HD was a full 10MB and backing up required 70 floppies.
"The likes of Facebook and WhatsApp are free to those whose privacy is of zero value."
I don't really know why.
Users will say "But it works, we don't want to change waaagh scary" while simultaneously reporting 237 bugs all of which are OMG critical. Management will assume that it's cheaper, because existing stuff is already there so it's wasteful not to use it.
Now it's true that once a load of crufty business rules have built up with 17 levels of nested conditionals it can be risky to try and replicate it for fear of missing some obscure case that's bound to occur at an inconvenient time for a key customer. There's no documentation, of course. Or if there is it's the source code, six revisions behind, pasted into a word document with three screenshots taken as BMPs so the whole thing is 1.5G. This alone can make you say "sod it".
I can't find the correct phrase but maybe it's just a false analogy with physical things. Like reusing wood from an old shed to build a deck possibly is cheaper.
Confucius say, "Find worm in apple - bad. Find half a worm - worse."
It's like software layers. You'll see some groups that are utterly adamant about keeping things strictly in layers, yet there are often very noticeable barriers between the layers that are inefficient both in run time and in developer productivity
Everyone builds things in layers. At a minimum, you have the layer between the CPU and assembly language, between the kernel and userland, and between storage and the 'file' abstraction. There are plenty of other layers because they are helpful.
If the layers get in the way instead of helping, it means that the layers were designed poorly.
"First they came for the slanderers and i said nothing."
though they have something to do with micokernels
Which isn't that much.
Great, can we agree now that not much is something and not nothing?
Sure, if we'll also agree that "[introducing] (un)loadable modules" to a monolithic kernel "to address maintainability and extendability" does not in the least make that kernel any closer to a microkernel (because, in fact, it doesn't).
In other news Thylacines and Jackals have nothing to do with each other, except they both look like canids and fill similar ecological niches. Apples and oranges . . .
In other other news, Felis catus and Loxodonta africana have nothing to do with each other, except that they have four legs and bear live young.
Srsly, "both are kernels" and "both let you load and unload stuff" isn't much of an ecological niche. True microkernels (not "hybrid kernels" like the NT kernel or XNU) and monolithic kernels (with our without loadable modules) are sufficiently different from one another than "you can add or remove stuff at run time" isn't much in the way of commonality.
Well yes, normally you'd keep related functionality in the same task. In many RTOS environments everything's in a task anyway, it's the normal way to do things. There's isn't necessarily much synchronization of state, as each task maintains its own state (you don't need the file system's latest state in order to interact with it).
The distinction between kernel and application is blurred in many embedded systems. Similarly, the difference between a microkernel and a monolithic kernel can get blurred as a mono kernel may have kernel threads, or a micro kernel may still have a core kernel that controls access to basic hardware functions. Though you rarely see people use monolithic kernels in smaller embedded systems, there's significant overhead there, but monolithic starts being used when the system gets larger or there's less need for real time.
I wouldn't say minix is a good example here, performance wise, as it's intended primarily to be an educational tool.
From the operating systems that have survived out in the wild today (general, widespread use):
I see your point that this addresses short-coming, without moving towards a microkernel philosophy.
If it acquires resources on instantiation like a duck, then its a shared_ptr<Duck>
I've never seen a RISC processor that can match the performance of the best CISC processors. You know, nevermind the fact that tons of money has been poured into CISC processors making them faster and faster.
I hate to be pedantic (I'm lying, I love to be pedantic), but for what measure of performance? For out right single threaded MIPS/FLOPS and Flops/W in certain situations, they do win. For other measures of performance, not so much.
SJW n. One who posts facts.
I wouldn't say minix is a good example here, performance wise, as it's intended primarily to be an educational tool.
Still, it's worthwhile to examine the problems, and think about how to solve them. Minix has a clear and serious problem with the single threaded filesystem, and its lack of performance. If you want to scale it up to a general purpose computer system (not a small embedded, single purpose design), you're either going to have to a) keep it as a single task but make it smarter, or b) split it up in multiple tasks. Keeping a filesystem as a single task is problematic. For instance, a request comes in that requires waiting for physical hardware. While the task is waiting, other requests come in that can be served from the cache, but since the task is waiting for the hardware, these requests will have to wait, wasting performance. Trying to fix that by building giant state machines makes a huge mess out of the design. On the other hand, trying to improve performance with multiple, independent, threads basically turns the filesystem into a distributed filesystem, with all the synchronisation issues.
In a monolithic kernel, none of these problems exist. Every application request that enters the filesystem layer automatically continues in its own independent thread. When it hits an area that requires synchronisation, it briefly acquires a lock (usually without contention), does the work, and releases the lock. This is a much simpler design, with higher performance.
Worst of all is when they embark on a rewrite and give up half way through. I was involved in a project to port a C++/ActiveX based system to .NET forms. They ported most of the major views but left a lot of the minor stuff from the old codebase lying around and wrote bridges to host it in the new framework. So they doubled the code, half of it became bitrotten and hidden by the new code and bloated out the runtime. Great project.
Is apple powered by scientology?
The major problem is not the performance of the CPU.
To pass a message across a protected barrier, it means that you can't use two references to the same data. You must actually make a copy of the data. Everybody thinks that the time spent making the copy is a problem with microkernels that needs to be addressed. But the real problem is not the time to make a copy. The real problem is that you now have two different copies of the same data. And when you modify one copy of the data, to update some of the state, the other copy remains in the old state, which is generally not a good idea, unless you have carefully made a design that can tolerate such differences. Or, you would have to build in frequent synchronisation points where all the new state information is distributed to all the tasks. This means poor performance, because the tasks will spend a lot of time waiting for other tasks to catch up.
I'm waiting for Intel to integrate systemd in the next core update.
Worst of all is when they embark on a rewrite and give up half way through.
I saw something similar happen at my employer. A newer employee was sent on a mostly solo project to rewrite some of the core of our product, to make it easier to make some planned enhancements. Things didn't immediately work perfectly, and some of the founding employees fought the changes. Since they were necessary for the next release, and there wasn't another option for the features that we were required to make available, we ended up with two parallel sections of code that did basically the same thing but in very different ways. That was what we put up with for 4 or 5 *years* before the politics of the situation allowed us to integrate the changes back into a single codepath.
Before that, if you didn't touch that code constantly, it was usually unclear which path was being actively used and which was bitrotted to uselessness.
It is pitch black. You are likely to be eaten by a grue.
Way to miss the point.
Besides, 'assembler' is a part of the toolchain. The language(s) are called 'assembly'.
CLI paste? paste.pr0.tips!
yes. 32-valve, too.
CLI paste? paste.pr0.tips!
Management pressure forces even good developers to produce. Sometimes, against your better judgement, you have to go for the quick fix to meet a deadline (or not exceed a deadline by too much). Or, developers make a proof-of-concept or prototype, and management says "ship that".
Once management has its grubby hands on existing codebase, it ignores the accrued technical debt, poo-poos developers warnings to rewrite some stuff that was thrown together in the heat of battle, and never funds general background cleanup tasking.
Management does not believe in bit-rot.
Building codes generally allow homeowners to install a second layer of asphalt shingles over an existing layer of same. But once you're considering a 3rd layer, building codes generally require the entire roof to be stripped and redone.
Software should be the same way.
Wouldn't a total re-write be the right thing to do instead?
Yes, if you can get the proper requirements. (This does not apply to the current article, since I assume that the requirements for these syscalls, etc. are well described.)
On most business systems, especially one that us written over the course of a few months, the requirements are just as spaghetti as the code, so rewriting the system from scratch might also rewriting the requirements from scratch, which is a monumental task if it already have customers with different configurations.
On a more humorous note, I find it funny that this is today's article on The Daily WTF: Seven Minutes In Heaven
I find your signature incredibly relevant to your post.
For large sets, this will be our guide even unto death, for the LORD will work for each type of data it is applied to...
I don't really know why. Users will say "But it works, we don't want to change waaagh scary" while simultaneously reporting 237 bugs all of which are OMG critical.
Because if you did it wrong the first time, there's no chance that you're going to do it better the second time. You'll end up leaving out crucial functionality or something.
If you don't know how to clean up a codebase in-place by rewriting a little at a time, then you aren't skilled enough to do a rewrite from scratch.
"First they came for the slanderers and i said nothing."
Because it's always cheaper and faster to kludge the next feature on to make the next shipment or put out the current fire than it is to rewrite. If they would take a step back, look at what they want to be in a year or two they'd realize that the time invested would pay for itself over and over. But most places don't have that kind of foresight, or the manpower to do that while also dealing with all/ the short-term crises arising from the current codebase.
In a monolithic kernel, none of these problems exist. Every application request that enters the filesystem layer automatically continues in its own independent thread. When it hits an area that requires synchronisation, it briefly acquires a lock (usually without contention), does the work, and releases the lock. This is a much simpler design, with higher performance.
Any particular reason you couldn't do the same thing in a microkernel? I'm envisioning some form of IPC primitive that automatically spawns a lightweight thread to handle each incoming message, which isn't too different from the monolithic kernel approach apart from not having a fixed 1:1 correspondence between the client and server contexts. You would be able to use your shared data structures and locks just as you would in a monolithic kernel, at least within the filesystem code. For anything else, of course, you'd need to use IPC.
"The state is that great fiction by which everyone tries to live at the expense of everyone else." - Bastiat
Why are you assuming the person fixing it is the one who originally wrote it?
Confucius say, "Find worm in apple - bad. Find half a worm - worse."
I'm not. If a different person is fixing it, then the second paragraph applies.
"First they came for the slanderers and i said nothing."
The main reason CISC is faster today is probably more related capital investment needed in production. Intel just have so much more.
This was basically what I was trying to say. More capital investment typically means better outcomes.
WARNING! This girl exceeds the MAXIMUM SAFE standards established by the FDA for BRATTINESS
I wasn't intending on saying that CISC was superior to RISC... what I was more saying is that there has been more money put into CISC processors, and so they develop faster.
It's just a simple fact of money == better access to stuff to make more money.
WARNING! This girl exceeds the MAXIMUM SAFE standards established by the FDA for BRATTINESS
That's cheating.
And inside, a modern x86 processor is actually a giant hardware emulation of x86 instructions with a RISC/VLIW core... You call it cheating, and I call it optimizing.
They second you try a cool trick like migrating a thread to another machine...
But this would happen with a macrokernel as well... you can't just magically make networking overhead disappear...
WARNING! This girl exceeds the MAXIMUM SAFE standards established by the FDA for BRATTINESS
I've never seen a RISC processor that can match the performance of the best CISC processors. You know, nevermind the fact that tons of money has been poured into CISC processors making them faster and faster.
Sometimes, it's just a matter of where the attention has been placed.
That's b'cos active development of RISC processors stopped in the last decade, most of them sunk by the hype around the Itanic. Otherwise, the last time you had the big 3 RISC processors - POWER2, PA-RISC and Alpha 21264, they were heavily ahead of the Pentium. Except that most people don't run Dhrystones or SPEC## or other such benchmarks - they run real software, which just wasn't available for most of the above CPUs.
Two things happened since the demise of the PA-RISC, Alpha and MIPS III & IV: first, Intel continued to shrink their CPUs faster, thereby increasing their individual performance and closing the gap w/ RISC CPUs. But the biggest coup for Intel was Windows NT becoming the underlying OS for all Windows OSs in Windows 2000 and then XP. While Windows 9x wasn't SMP capable, suddenly, Windows XP was. So Intel could take an optimal CPU core, toss in as many as made sense - 2, 4, 8, whatever - and run them against the fastest of RISC workstations. That had 2 big advantages over RISC - first, it continued to run native Wintel software, regardless of whether they were multithreaded or not, or optimized for multi-core or not. The second was that at a given price point, Intel could toss several cores to not just match, but even exceed the performance of a RISC workstation. Once that happened, the reason to prefer RISC at all went away. RISC could have gotten the same attention that the x64 got, but unless Microsoft took the initiative in porting the bulk of their applications to, say, the Alpha, that wasn't gonna move. Compaq saw that and pulled the plug on that platform.
HURD never looked at seriously good micro-kernels. Or else, they'd have forked Minix3, and used that as a basis for their platforrm. Instead, they went through 4 different micro-kernels before settling on the most archaic of them all - Mach 3, which is least micro-kernel by the most current definitions of the concept