High Performance Linux Kernel Project — LinuxDNA
Thaidog submits word of a high-performance Linux kernel project called "LinuxDNA," writing "I am heading up a project to get a current kernel version to compile with the Intel ICC compiler and we have finally had success in creating a kernel! All the instructions to compile the kernel are there (geared towards Gentoo, but obviously it can work on any Linux) and it is relatively easy for anyone with the skills to compile a kernel to get it working. We see this as a great project for high performance clusters, gaming and scientific computing. The hopes are to maintain a kernel source along side the current kernel ... the mirror has 2.6.22 on it currently, because there are a few changes after .22 that make compiling a little harder for the average Joe (but not impossible). Here is our first story in Linux Journal."
Why don't they try to make ICC fully GCC compatible so we can recompile EVERYTHING with ICC and have a 8-9 to 40% performance gain.
1% APY, No fees, Online Bank https://captl1.co/2uIErYq Don't let your $$$ sit in a no-interest acct.
IMHO This is a great development, for one important reason.
Portability of the kernel.
GCC is a great compiler, but relying on it excessively is a bad thing for the quality of kernel code, the wider range of compilers used, the more portable and robust the code should become.
I know there will be the usual torrent of its-just-not-open-enough rants, but my reasoning has nothing to do with that, it is simply healthy for the kernel to be compilable across more compilers.
It also could have interesting implications with respect to the current GCC licensing 'changes' enforcing GPL on the new plugin structures, etc.
GCC is a wonderful compiler however it has in the past had problems with political motivations rather than technical, and moves like this could help protect against those in the future (some of us still remember the gcc->pgcc->egcs->gcc debarcle).
Of course no discussion of compilers should happen without also mentioning LLVM, another valuable project.
hum... not that impressive ...
Ingo A. Kubblin is quoted as saying:
is that 8-9% overall speedup of applications, or just kernel tasks?
The real "Libtards" are the Libertarians!
I would imagine that it means for the kernel. We would then need to factor in how much time user applications spend in the kernel. Anything that is I/O-intensive is kernel-intensive. Anything that is malloc-intensive may be kernel-intensive if you're using a VM-based memory pool rather than a pre-allocated one.
I'm also wondering how this would compare to using Cilk++ and #defining the few keywords it has to the standard keywords when using vanilla GCC or ICC.
Perhaps there should be a table showing the relative performance of the different kernel subsystems under different compilation methods.
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
Looking at Amdahl's law (golden oldie here) how much time does a PC spend on kernel tasks these days?
You see, I'm a consultant and am paid by the hour.
FTA: the mirror has 2.6.22 on it currently, because there are a few changes after .22 that make compiling a little harder for the average Joe (but not impossible). Here is our first story in Linux Journal."
..because the average Joe compiles his Linux 2.6.22 kernel with the intel C compiler. On gentoo linux! His neighbour, Sixpack Fred on the other hand, can compile his lastest kernel with the intel compiler. On a C64. From 7 feet away. While humming all the instruments in Ride of the Valkyries.
So GCC is slow compared to the Intel compiler?
I completely agree. I ran into this when I was working as a software architect on a project that had been around for a while. Contracts were required compiler compatibility instead of standard compatibility. It made updates to the dev environment much more complicated. The contracts should have specified standards, but its writers didn't know any better-- the customer had no need to stick to a compiler product/version. It also makes your code more dependent upon the compiler's quirks. I would mod you up if I had the points.
A few years ago someone figured out that Intel's compiler was engaged in dirty tricks: it inserted code to cause poor performance on hardware that did not have an Intel CPUID.
http://techreport.com/discussions.x/8547
But perhaps they have cleaned this up before the 10.0 release:
http://blogs.zdnet.com/Ou/?p=518
steveha
lf(1): it's like ls(1) but sorts filenames by extension, tersely
They should think about moving to a Java kernel. They could just bootstrap one of the new, clever "Just-In-Time" Virtual Machines at powerup.
These JVMs are able to dynamically optimize the running code in real-time, far beyond what could be achieved by C or C++ compilers, without any performance degradation.
A Java kernel would likely run at least 50 times faster then the very best hand coded assembler - and since the language is completely type-safe and doesn't implement dangerous legacy language features such as pointers or multiple-inheritance then it would be unlikely to ever crash.
I've always wondered if anyone has spent time trying to develop optimizations for the kernel if various specific instruction sets are detected?
http://blindscribblings.com - Tasty pop-culture in conceptual fashion.
Not much.
http://www.google.com/codesearch?q=SSE2+package%3Akernel.org
but do you realy want to?
Wouldn't it be better to fix GCC so it has the same optimisations?
Looking around, without leaving my chair, I see a commercial firewall/VPN/router box running Linux. I see a commercial wireless accesspoint running Linux. I see a commercial PBX running Linux. And I see two different embedded boards running Linux (admittedly our private design).
Yup...I guess you're right. No one uses Linux for anything important.
Other than every supercomputer on the planet worth talking about, that is...
"My life's work has been to prompt others... and be forgotten." --Cyrano de Bergerac
If your program is malloc-intensive and you care about performance, you may as well just use a memory pool in userland. It is very bad practice to depend upon specific platform optimisations when deciding which optimisations not to perform on your code. Then you move to another operating system like FreeBSD or Solaris and find your assumptions were wrong and you must now implement that optimisation anyway.
Sam ty sig.
Get back to us when the US missile defense system has actually destroyed a foreign missile (assuming that Slashdot is still around then).
It's not designed to do that. It's designed to suck up as much money as possible whilst simply threatening to down a missile.
Once I was a four stone apology. Now I am two separate gorillas.
LLVM has enabled proprietary forks of GCC, effectively.
For example Adobe has recently released a C -> FlashVM compiler. It leverages GCC's great front end. Some of the people most excited by alchemy were the folks working on open flash engines (and projects like haXe), unfortunately, alchemy uses LLVM to couple GCC's front end with proprietary a code generation backend.
So it looks like we're headed back to the bad old days where everything had its own proprietary and incompatible compiler. :(
We were not impressed.
-- Erich
Slashdot reader since 1997
We would then need to factor in how much time user applications spend in the kernel. Anything that is I/O-intensive is kernel-intensive.
What do you mean? I don't think icc will speed up my hard drive.
I might be completely wrong but:
RMS felt that making it easy to produce plugins for GCC would be a very bad idea since closed source could exploit this. We really want GCC improvements to be free software so his hesitation has some merits.
Exactly how this relates to LLVM I dunno..
This kernel is so ancient that any possible performance gains are outweighed by the new kernels performance, bug fixes, and improved driver support. Plus why would someone want to toss away their freedom by using a non-free compiler? Also, does the Intel compiler work with AMD processors?
There is so much against this that it is useless, until Intel open sources, can work with up to date kernels, and can work on all x86 and x86_64 compatible hardware (im not sure if this is a problem) then im not interested.
As a certified and accredited software engineer, I think it's time for Linux to be re-written in Javascript. The competition between Chrome, Firefox, IE and Safari has resulted in incredibly fast Javascript interpreters, and if Axl Torvalds mandates a switch to JS, the kernel could automatically take advantage of these improvements. After all, the OS and the web are becoming one, and within 10 years all applications will be in the cloud, delivered via the raintubes.
This is very relevant to my interests. We'd tried a while back to compile a Linux kernel with ICC and had too numerous issues to list. We do a lot of work with fluid dynamics and it's ALL CPU based - any increase in speed would be appreciated. With the economy the way it is, and a lot of companies shelving projects, budgeting for new clusters isn't on the list of priorities.
There are actually quite a few ARM processors that do. See Jazelle.
Java is not a "systems language", meaning you don't write operating systems and systems level code in it for very good reasons.
Funny cause Sun already did that like 13 years ago.
One of them being, name me a processor that can run Java bytecode nativly.
The ARM9E.
It won't speed up the hard drive, but it should reduce the latency of a context switch (something like 21 microseconds, isn't it?) and it should also reduce the latency involved in going through the various layers of the kernel.
Yes, this isn't much in comparison to the speed of the drive, but that's not the point. I didn't say it would speed it up by a lot, merely that it would speed up.
I don't know what the latency is within the kernel in the VFS layer or within the different filesystems (ignoring mechanical delays whether from reading the data or any metadata needed due to the FS algorithm), but I can be certain it won't be zero. I can also be certain that much of this latency won't be synchronized with the disk spinning, so it's not going to vanish in a spout of parallelism. Although I can't see any reason why this would be impossible if the FS and hard drive were designed in tandem. That's not the way it's usually done, though.
The practical upshot is that using ICC and getting the 8% savings in the kernel might give you a 0.0008% improvement in performance (assuming no savings via the drive cache). Not a whole lot, certainly not enough to show on any but the most sensitive of disk I/O performance gauges, but it's still a saving.
If the drive has an on-board RAM cache large enough to eliminate consideration of the mechanical components, then I/O savings would return to the more normal 8%.
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
These JVMs are able to dynamically optimize the running code in real-time, far beyond what could be achieved by C or C++ compilers, without any performance degradation.
They can indeed do that for a great many algorithms. I recently worked on a huge number crunching application. There's a little throughput/sec display. It's amazing to see it keeping getting faster and faster when the JIT compilers keeps reordering the instructions.
Regarding your ASM joke, HP's project Dynamo, 10 years ago, showed that the same could be done with ASM, beating any hand-crafted ASM you could ever dream of.
*That said* the JVM does not provide anything to work with low-level DMA/IRQ/IO/etc. So good luck writing a fast kernel in ASM.
A compliant JVM is indeed completely immune to buffer overrun.
Oh, and Java does of course support multiple inheritance. In OO-Analysis and OO-Design implementation is completely secondary to modelization. Java supports multiple interface inheritance and hence Java supports OO's multiple inheritance concept in the cleanest, most abstracted, way possible. You're getting confused, like 99.9% of people out there discovering OO, by mistaking "concrete inheritance" (aka "implementation inheritance") for inheritance. There are entire frameworks, like Swing (used to power the Real World [TM]), that are entirely modelled using interfaces. There are also very fine OO languages that simply do not support "implementation inheritance", which should give you food for thoughts. Remember that in OO implementation is a detail and you'll be fine in the future.
As a sidenote, implementation inheritance in Java (i.e. using the "extends" keywords on a *class*) can be seen as a quick way to use delegation. A quick and *dirty* way, for it shall make your program harder to maintain and harder to test.
This is because 99.9% of people out there don't know what "abstraction" mean.
that really is hilarious - you know, I've known some people who would actually believe this too
Why is this funny? The overhead of separate virtual memory spaces (~20%) is roughly the cost of a JVM (~30%). A Java kernel is only funny because of network effects, existing C programs, that make it impossible in practice.
Last year Rob Landley was working on getting the Tiny C Compiler to build the kernel unmodified (again by adding gccisms to tcc) - here's an OLS video of the Landley talking about changing tcc to compile the kernel. Alas, from what I gather this effort has stalled for now.
It is unlikely that you will see the kernel adopting anything that makes the build process much more complicated. Operating system glue layers (e.g. abstractions in code for drivers that are supposed to run on other platforms) are already already frowned upon in drivers. Any new dependencies on tools like autoconf or cmake would most likely be rejected with a "what are we gaining?" complaint. My understanding is that patches that convert gccisms to their C99 equivalents are generally accepted but people are not willing to maintain glue for other compilers because it makes the default case painful. That's their choice - there are always other OSes like NetBSD that can be "compiler portable".
That's a common misconception but malloc is not a kernel call but a user land function.
Malloc is implemented in the libc by managing a since large area of memory (the heap). When the heap is full, malloc() increases the heap size by a system call such as sbrk(). On my system (64bit), the heap is increased by blocks of 128KB.
For large data sizes (>128KB) malloc does not use the heap and directly allocates the memory using the system call mmap().
For example, for an application allocation up 100MB the overall number of calls to sbrk() is no more than 100MB/128KB = 800 regardless of the number of calls to malloc()/free() which can be millions. The kernel calls are totally negligible.
There is a nice article here: http://www.linuxjournal.com/article/6390
Actually, there was that one that ran Windows...... so not _every_ supercomputer.
-]Phreak Out[-
The plain fact that the project started in 2004 and by now they can't state anything more precise doesn't make it appear very promising. The time would probably have been better invested in improving GCC.
There are a few spots of the kernel that do use hand crafted SSE assembly (a quick glance says RAID calculation is one area (also here) and a particular crypto routine is another) but it is quite rare. Up until SSE4, SSE was really targeted at multimedia applications that contained a lot of floating point arithmetic. Generally floating point is avoided within the kernel so the maintenance pain of crafting an SSE optimised routine along with generic C version would not be worth it. Seemingly when you go to write assembly these days you have to account for the following:
Also don't conclude that just because the kernel doesn't contain many handcrafted SSE routines that no SSE instructions will ever be emitted by the compiler (assuming you've told it that you have a CPU that can cope with them). I believe very new versions of GCC (4.3 and above) have the just gained the ability to emit SSE4 instructions.
A Java kernel would likely run at least 50 times faster then the very best hand coded assembler
I'm going to have to agree with you here. However with all the major browser producers concentrating on JavaScript speed recently, I'd say it's much better to use JavaScript instead of plain Java. Think about it, JavaScript is where the speediness is. Also, since almost every browser supports it, you could just boot the kernel using any browser. This could potentially get the kernel out of the hands of that bunch of self-righteous, elitist Linux hackers who are currently totally disconnected from users like you and I.~
8 of 13 people found this answer helpful. Did you?
So the general answer is no it will not be faster. This is because as a final step (the so called stage3) it compiles itself with itself. This assumes icc isn't malicious (yes I know - Trusting Trust and Countering Trusting Trust etc).
What a complete load of garbage:
Don't get me wrong, I love Java and I'm a full time Java developer but here's why what you just said is a complete load of Tosh:
1. JIT compilation is just doing a similar job to a C/C++ compiler only at run time. It's certainly not any better than the excellent ICC compiler and for an operating system which you would want to boot quickly would you really want it compiling half the OS at boot up?
2. Most of Java's IO operations are just JNI wrappers for native C system calls anyway. Because Java has to be compatible with the APIs for all the different operating systems they make some nasty compromises in order to present a common interface. That means that Java IO is almost always alot slower than native IO.
3. It's just a terrible idea! Java is a wonderful language but not for operating systems.
Apparently so.
AVR32 can also run java bytecode natively.
Man, if the world stood by and waited for Democrats to actually create solutions, we'd be in some kind of masssive global financial meltdown or something...
Oh, wait...
Wow one out of 500.. awesome..
No, it doesn't. It just supports few of the commonly used byte codes and has instructions to optimize the common scenarios. It most certainly doesn't have a fullfledged native java byte code support.
Hey, why doesn't anyone fix the notorious issues in the kernel first? Before playing around with some fancy new compiler... The kernel performance is broken for month, and nobody has fixed it yet. Here, when was this: last October! Last January! And it's still broken! http://it.slashdot.org/article.pl?sid=09/01/15/049201 http://linux.slashdot.org/article.pl?sid=08/10/27/1212214
And that's a problem, and the reason this really isn't all that exciting.
Computer memory is just fancy paper, CPUs just fancy pens with fancy erasers; the 'net is just a fancy backyard fence.
Sometimes a thing needs to be said more than once or twice.
In this case, maybe it needs to be said several thousand times.
Computer memory is just fancy paper, CPUs just fancy pens with fancy erasers; the 'net is just a fancy backyard fence.
Actually ARM has its own compiler, RVCT which constantly produces way faster and smaller code than the GCC for Symbian. See http://arm.com/products/DevTools/index.html for more info.
your disk controller and filesystem drivers do most of their stuff in the kernel, pumping megabytes of date to your hard drive requires these kernel subsystems (VFS, ...) to do a lot of work
The ARM926EJ-S processor also includes ARM Jazelle(TM) technology which enables the direct execution of Java bytecodes in hardware.
I guess ARM is lying about their own products?
ICC is hands down the best C++ compiler for x86 and x64 from a performance perspective. GCC isnt even in the running on that front. All GCC has going for it is that its "free"
Given that we get a few percent benefit from using ICC over GCC after heavy tuning, I'd say that GCC gets the job done pretty well, given that it is a general purpose, multiplatform compiler. And one thing ICC totally sucks at is compiling speed - ICC takes two to three times longer than GCC to compile and link code. When I need to test a feature today, you can guess which compiler I reach for, especially when GCC can take over an hour to complete some of the code bases here.
Cheers,
Toby Haynes
Anything I post is strictly my own thoughts and doesn't necessarily have anything to do with the opinions of IBM.
instruct Gentoo users, why don't you just write an ebuild? After all, you did build this on Gentoo, why not add it into portage? There's not even an overlay!
I understand those words but not their meaning together.
ROMANES EUNT DOMUS
The HPC and gaming communities probably won't care much about this, aside from the tweakers who spend $500 to overclock a $200 CPU to perform like a $400 CPU. The vast majority of workloads spend very little time in the kernel. The glaring exception to this is the network stack, where you can have a lot of rather CPU-intensive packet-mangling, routing, firewalling, IPSEC tunneling, and header processing done entirely in kernelspace. Ever try saturating a 10 Gbit ethernet interface? If you don't do some careful tuning, it quickly becomes CPU-bound, and much of that tuning is application-specific, so it's not much help when the applications generating the traffic are running remotely and beyond your control. Now try saturating two of them. Or more. This is one of the reasons why most people don't use Linux for high-end routing, but if it were possible, a lot of people would be very thrilled to move from things like IOS to something where they can use less exotic management tools that don't require such expensive training and support, and where there's much less vendor lock-in.
There's no failure quite as dissatisfying as a complete and total solution to the wrong problem.
Its their compiler, they are damn well allowed to do what they want - call me when AMD pour that kind of resource into having their own compiler.
This sort of "a company can do anything it wants with its own products" comments appear almost every time someone mentions anti-competitive behavior, and then people explain that no, a company should not be allowed to leverage a monopoly position to further entrench itself. Should the government allow practices like, e.g., dumping, the market would be dominated by a very small number of mega-corporations, ruining the economy.
Seriously, are you a troll?
50 times faster than the best hand coded assembler? I can't tell if you're being sarcastic or you really do think that it's possible to make any compiled code run better than the "best" assembler.
If a JVM can produce the machine code, so can a human produce it by hand coding in assembly.
Oh brother. Comparing virtual memory spaces to JVMs. Only on slashdot.
XML is like violence. If it doesn't solve the problem, use more.
Sure... given that the compiled code basically gets broken into a set of instructions that are encoded using an executable format, the same way that assembly code gets broken into a set of instructions that are encoded using an executable format.
Seriously, most non-trivial programs are better optimized by the compiler. Humans miss stuff, and often don't optimize very well anyway.
XML is like violence. If it doesn't solve the problem, use more.
Yeah, non trivial programs. But no one said you have to write the whole non-trivial program in assembly. Sure, a human can't optimize code better than a compiler in will without a great deal more effort.
However a human can focus on the areas (time critical inner loops) that need optimization and do it when necessary. And there are still things a determined human can do better than a compiler. Things like avoiding branches with flag / conditional move tricks, vectorizing complex floating point operations and using a few instruction tricks to get a loop into a cache line are all things a human can still do better.
In general, you should let the compiler have a go first though and see if you can do better if it's really needed. Of course, if you're some kind of "enterprise" coder, it's not needed. However, if you're writing something that is really performance critical (games, especially on portable platforms, or certain kinds of scientific computing), you should get your hands dirty if performance isn't quite up to snuff.
Note you don't have to use assembly so much anymore, much of the available low level functionality is available through special machine level intrinsics (especially in the case of SSE) on modern compilers.