Linux Kernel Surpasses 10 Million Lines of Code
javipas writes "A simple analysis of the most updated version (a Git checkout) of the Linux kernel reveals that the number of lines of all its source code surpasses 10 million, but attention: this number includes blank lines, comments, and text files. With a deeper analysis thanks to the SLOCCount tool, you can get the real number of pure code lines: 6.399.191, with 96.4% of them developed in C, and 3.3% using assembler. The number grows clearly with each new version of the kernel, that seems to be launched each 90 days approximately."
That the line count increases with each new version unless you are starting from scratch?
--
Oh Well, Bad Karma and all . . .
Beer is proof that God loves us and wants us to be happy.
And how much of this lines are for core functions (Memory Managements, Scheduler, etc) and for drivers (USB, Filesystem)
Â_Â
AND???
In other news, trees tend to grow up unless they tend to grow down or sideways. Sharks tend to eat anything they can, unless they are not hungry.
Anonymous will beat me to FP for sure, unless they dont.
NO SIG
Too bad 9,999,999 lines of that code were ripped off from SCO.
Lines of code is not a good metric for performance. I'm in a software engineering class listening to how to use metrics on code.
*cough*assembly*cough*
"assembler" is the tool, not the language.
"When life gives you lemons, don't make lemonade. Make life take the lemons back!" -- Cave Johnson
I wonder how many lines of code Windows has
Because we'd all like to know how many man-months something a big as the linux kernel should take to implement. And laugh at the huge price tag sloccount will put on it.
“Common sense is not so common.” — Voltaire
I'm a developer and was wondering what kind of testing is done to verify the code. Do they use unit testing? Regression testing?
I'm just curious because keeping 6+ million lines of code almost completely bug free is pretty amazing.
Yeah but you can customize the Linux kernel. If you don't want features, just don't compile them in.
It's easy, there's even a gui interface.
Good luck compiling a custom NT kernel. :)
Mod me down, my New Earth Global Warmingist friends!
It's significantly easier to hide a malicious backdoor inside a huge software project than a small one. Linux has already had a near miss back in 2003, when the CVS repository was compromised. Considering how many mission-critical applications run under Linux, there's a huge financial incentive to hide a backdoor somewhere in those 10 million lines.
Now, where do we find a birthday cake with ten million candles?
15. The Residents - Not Available
If Obama is missing that record, I'd be glad to lend him my copy.
Momentarily, the need for the construction of new light will no longer exist.
96,4% of them developed in C, and 3,3% using assembler
That leaves .3% that is unaccounted for. What was it written in?
Damn_registrars has no butt-hole. Damn_registrars has no use for a butt-hole.
May I suggest that large parts of this shouldn't be in the kernel at all? That there should be independent sub-systems so that in the event of a crash or panic, the entire OS doesn't come tumbling down?
So that badly written drivers (especially graphic card drivers) don't affect the stability of the entire system?
May I suggest that flame-wars are good and the EMACS is also bloated?
(And lots of other folks have already talked about the bad metric that lines of code is...)
I wank in the shower.
This raises the question - will Linus run out of magic powder?
Take the cheese to sickbay, the doctor should see it as soon as possible - B'Elanna Torres, "Learning Curve"
It's called a lameness filter because it's pretty lame. Try pasting the definition of a word from reference.com, or the lyrics to a longish song. Or a joke that relys on caps to be funny.
The lame mess filter won't let you.
Free Martian Whores!
Since that many lines = approx. 125,000 pages, which = approx. 0.0175 terabytes, and... a LOC is approx. 18 TB, I'd say they have a ways to go...
I wonder what the breakdown is of the almost 4 million lines that were omitted in the count, for blank lines, comments, etc.? I've always said that commenting your code is a very good thing to do, so it would be interesting to see what the percentage of this is comments, as opposed to blank lines (which isn't a bad thing for readability).
Attention all planets of the Solar Federation! We have assumed control! - Neil Peart
You cared. At least enough to add a comment...
Contentment is the greatest wealth
- Sukhavagga Dhammapada
Contentment is the goal behind all goals.
You don't want to know.
Interestingly only a year ago the i386 and x86_64 trees merged into one, greatly reducing the SLOC count at the time.
Basically, this story is "Linux kernel surpasses 10 million lines of code! Just kidding."
Funny that the summary calls attention to the fact that the number of lines includes comments and whitespace without any mention of how worthless lines of code is as a metric. Someone could easily go in and add or remove newlines wherever they wanted and without changed a bit of code make it 50 million or 50 thousand.
Whale
Remember, the 10M lines is just the kernel in Linux, not an entire distro (ie: kernel + GNU stuff + X + apps + all the other stuff), so a total count of Windows LOC would be comparing apples and oranges.
IE: How many LOC are in NTOSKRNL + Drivers would be a better comparison.
And what would be better, a kernel that you could simply include or not include certain modules without the need for compilation, making the kernel truly modular, and hot-swapping them in or out based on your needs. That would make the kernel much more powerful and also useful for "normal" users/admins who might not want to mess with compiling. But, I'm sure my argument will be slapped at by some leave-things-be get-off-my-lawn fanboy who hates the idea of scary new features like true/better modularity.
Save a tree. Let the actual devs do compiling unless someone really actually wants to see the code.
Promote true freedom - support standards and interoperability.
Sorry everyone, that was me! Silly push %ebp ... Apologies to all...
It's quite a frequent occurrence in non-English-speaking countries. It annoys the heck out of me, and is clearly illogical, but hey - it happens.
This article summary is not very informative. The very least they could do is tell us which ten million lines of code Linux has surpassed.
Post may contain irony: discontinue use if experiencing mood swings, nausea or elevated blood pressure.
The better metric would be how many Libraries of Congress the kernal is.
Perhaps better would be number of times the size of the Unix System 6 kernel.
That's the one that the University of Waterloo printed as a textbook, half of a two book set. (The other book was the OS course text using it as the example.) They printed it at 50 lines per page column and added (lots of) whitespace and adjusted comments so routines fell on nice page boundaries. Even padded this way it came out to a total of ten thousand lines (of which I think 2 thousand were still in assembly code). Just right for one person to maintain full-time by the then-current rule-of-thumb.
So the linux kernel is a thousand times the size of that (whitespace-padded) version of the Unix kernel.
Bantam Dominique roosters crow a four-note song. Once you've heard it as "Happy BIRTHday" you can't NOT hear it that way
i believe a more appropriate measure of the 'bloat' (i.e. useless functions) or the size of any software package is through function point analysis--
http://en.wikipedia.org/wiki/Function_point
http://www.softwaremetrics.com/fpafund.html
the lines of code metric has long been considered an inadequate measure of software cost, complexity, or size - here is an article on why:
http://www.creativyst.com/Doc/Articles/Mgt/LOCMonster/LOCMonster.htm
but LOC is without question one of the easiest measurement (aside from total package size in bytes, which is nearly as uninformative)
let's see now. 10 years ago the battle cry of linux over Windows is that it's leaner. leaner being meaner and faster. now with all that unnecessary bloat code, what's linux's battle cry now?
I think that what you are suggesting is already standard fare for the Linux kernel.
Typically, the kernel and all modules are precompiled. Then, modules are swapped in and out as needed.
Ok, I'm not offended by your post.
Illogical? Just as arbitrary as the other way round.
This only proves that the Linux Kernel is in need of a significant refactoring effort. The capacity for any single developer to understand or even read a significant portion of this code is NIL. As a result, the opportunity to reduce duplication of effort is quickly diminishing, and the ability of new users to contribute anything other than additional bloat is similarly diminishing. And while the core of the kernel may be "small", and much of this code is dealing with special cases for specific hardware, because of the size of the code involved it is increasingly difficult to identify what is substantial and what is merely stylistic differences between two drivers. Increasing LOC counts is a sure sign of under analysis and over reliance on the availability of cheap labor. You can pick any arbitrary number of lines of code (less than say 20k) and pick that as the number of lines the kernel should occupy. As an individual line may define a new abstraction, LOC represent a potential for a geometric increase in complexity. So either these 6-10 million lines of code represent some truly staggering level of irreducible complexity (most unlikely), or are merely the result of not refactoring the code sufficiently (most likely). This really is a milestone in gratuitous complexification that should be morned, not hailed.
You could try:
DIVIDE SLOC BY 1000 GIVING KLOC.
Not everything that can be measured matters; Not everything that matters can be measured.
If you're actually serious, (sarcasm is kind of hard to detect in plain text): man modprobe. Since Linux 2.0.
10 million lines is all well and fine, but more importantly what's the fuck count up to now?... Yeah, yeah,okay I know. run --> grep -r 'fuck' /usr/src/linux* and count it up for myself... Sheesh!
Bitkeeper, yes. Git, yes. But CVS?
Comments are also code.
If you only count as code what can be feed to the machine, you should only count the compiled binary. Source code is meant to be read by *humans*, so comments do count.
Singularity: a belief in the "God" idea with the "demiurge" relation inverted.
In addition there is also ksplice, to swap the actual kernel too.
IranAir Flight 655 never forget!
Comments are also code.
If you only count as code what can be feed to the machine, you should look at the size of the compiled binary. Source code is meant to be read by *humans*, so comments do count. That's why the GPL requires them to be left in the files (the "preferred form" to edit), otherwise it wouldn't be source code.
Singularity: a belief in the "God" idea with the "demiurge" relation inverted.
Lets not forget that the NT kernel runs on ~4 arches linux runs on ~16, that has to make it bigger
IranAir Flight 655 never forget!
After certain level modularity is harmful.
It makes little sense, for example, to make Linux so modular it can be run on your mobile phone and Roadrunner supercomputer without recompilation.
Using commas in a number is consistent with separating longer things into chunks, where using a full stop in decimals is consistent with separating two distinct parts.
(I'm not b0ttle - but I thought I'd comment)
I'm well out of school.
And my first thought when I read it was "I wonder how that compares to Vista and OSX."
Although, it would be for the reason of comparing the difference between an open kernel, closed kernel and semi-open kernel. My thought was: how does it's stance on openness affect the LoC metric?
I feel it would be a totally pointless observation, but would be insightful. Does closed source push stricter coding standards? Or does Open Source? Who knows...?
How can anyone keep up with this volume of code? As a simple vb coder myself I can't imagine wrapping your mind around this much code.
I downloaded the latest 2.6.27.2 tarball, untarred it, removed all except the "x86" folder from under the "arch" folder and ran this in the source root:
find . -exec grep -v "^$\|^\*\|^#" {} \; | wc -l
to exclude blank lines, lines starting with "#" for comments and lines starting with "*", again for comments. I realize that this excludes the "#include" statements but there number should be negligible in the overall count.
The result is 6,022,957.
And how is this different from functionality of modprobe tool that does exactly that?
Actually IMHO it's better to split up big numbers with a space because you can use big, floating-point numbers in coordinates.
ex:
6 399 191, 96.4, 3.3
The Windows kernel might have a lot less, after all, it doesn't support nearly as many devices or architectures.
Climate Progress - Hell and High Water
Windows XP's OS code was about 6 Million LOC (Line of Code) while whole system had over 40 million LOC. How many LOC Windows Vista has in it's Operating System... mayby a 6-10 while the whole system might has over 40-60 million LOC?
We can really start believing what are the reasons for OS what is based to microkernel, while the monolith OS is HUGE to maintain...
"Round numbers are always false."
-- Samuel Johnson
I was referring to two things here, not having to compile modules, actual real modularity on the binary level for true "modules", and also anything that's not a Linux module, meaning settings or whatnot that have to be compiled into the kernel, instead of being switches and modules that you can throw in and out of the kernel. I don't think it's *all* modular, is what I'm saying, so any increase in that helps by making the kernel easier to work on because you can have definite targets and functionality, as well as making it easier to swap stuff in and out.
Promote true freedom - support standards and interoperability.
I was referring to two things here, not having to compile modules, actual real modularity on the binary level for true "modules", and also anything that's not a Linux module, meaning settings or whatnot that have to be compiled into the kernel, instead of being switches and modules that you can throw in and out of the kernel. I don't think it's *all* modular, is what I'm saying, so any increase in that helps by making the kernel easier to work on because you can have definite targets and functionality, as well as making it easier to swap stuff in and out.
In other words, actual driver modularity! So users can actually download and install drivers from off the intarwebz without having to compile them and Linux can actually, I dunno, be usable for 99% of users! Brilliant!
Promote true freedom - support standards and interoperability.
Yes, because not having to require normal users (99% of users) to compile is outrageous! Then Linux might actually get adopted by the masses!
:P
Wouldn't want that, especially if you're just using Linux to "be different".
Promote true freedom - support standards and interoperability.
apparently there was a bk to cvs gateway of some sort
Climate Progress - Hell and High Water
Linux was long time a macrokernel, including all drivers etc in kernel itself and loaded always in memory. After 2.2 it became a modular (in both cases it was monolith) so you could compile drivers etc as module. And then those modules were stored to disk, and not loaded to memory when the OS gets loaded to memory in start, what could lead very small memory usage, like only a few megs. And then when drivers etc are needed, they are read from disk to memory and be used. Bad thing was that every module was compiled to specific kernel version and the integration is very tight. And if you compile kernel, you need to compiler all the modules too. If one module or any other driver or OS part crashed, it brought the whole OS and software system down. So you needed to be sure that all the code what OS has, is checked and bug free. Now think that Linux has over 6 million lines of code (and how much are in drivers, what is most certain place to have a reason for crash!) so the OS is very HUGE.
But I think you are wanting OS what has a microkernel structure, where kernel itself is very small and all other OS parts are scattered to userland off from kernel, as OS servers, from there you could swap and compile all OS servers as they would be normal applications. Without need to compile kernel or other OS servers in same time. And if one OS server (driver, filesystem or network protocol) was compromised, you just could restart/replace that one without doing so for whole OS or whole software system. Because bug in driver affected just for the device driver itself, and not having possibilities to bring OS or whole software system down.
But you can always start wars with wich one is better OS structure, a Monolith kernel or a Microkernel based OS. Microsoft has used a microkernel structure in Windows NT and it continues using a microkernel structured OS in MinWin and on Singularity. So the OS should be very stable but we know that driver can bring the OS or the whole software system down very easily, even it should not be possible (in theory only?).
Since that many lines = approx. 125,000 pages, which = approx. 0.0175 terabytes
and "approx. 0.0175 terabytes" == approx 18 GB which explains why the kernel source's tarball weighs 48 MB. Yup, if you wanted to know how big the kernel source is you just had to look for it ;-).
Besides, I really wonder how you got to that figure, considered how code has little in common with classically formatted text anyways. Not to mention how 125,000 pages == 18 GB, I mean do you have 150,000 characters per page (2,000 lines per page?)?
You just got troll'd!
They must be getting paid by the line.... no, wait...
oh, they oustourced it to India by the line... no, wait...
They love writing code.
Yeah, that's it. Makes sense now.
Whoever they is.
deleting the extra space after periods so i can stay relevant, yeah.
It isn't mismatched, it is just reversed from the US way of doing it. Mismatched would be "6.399.191, with 96.4%" or something.
However. I still think it is weird and cannot get used to it, despite preferring metric and EU data order.
Climate Progress - Hell and High Water
We can really start believing what are the reasons for OS what is based to microkernel, while the monolith OS is HUGE to maintain...
Except that I seriously doubt this number is LOC for the kernel itself - this undoubtedly includes drivers and other loadable modules. The number of LOC for the kernel itself, excluding loadable modules, is much less. Linux cannot seriously be called "monolithic".
-- Ed Carp, N7EKG erc@pobox.com PGP KeyID: 0x0BD32C9B What I'm up to: http://intuitives.mine.nu
No I'm not talking about that and I shouldn't have been so vague. Whether or not you have to compile modules or instead have binary compatibility so you can just plug them in has nothing to do with kernel space vs. user space and such. If you want stuff to stay in kernel space and make users put in their passwords to install stuff, so be it, but make stable APIs/ABIs so that good modularity exists and users can install drivers or configure their kernel without having to compile, something that should be done once by the developers unless other users/devs want to see their code and want to compile stuff themselves. Modularity means that devs can more easily target certain areas and divide and conquer work, and it means having more points that have stable APIs and ABIs for greater flexibility and just, an easier Linux experience. What if I don't want to compile a patch, what if I just want to throw it in there, and I don't want to have to rely and wait on a distro repository compiler to do it for me but instead get it directly from the source? I'd rather get my kernel updates from the source, as well as all my other software, instead of being walled in unless I go compile something myself.
;) But seriously, recompilation is really inefficient, so it shouldn't be required by an OS, you could do better things with your CPU like the @Home projects. :P
Not to mention all the electricity it'd save from having only a few computers in the world doing the work to compile. And...um...they use trees for firewood, so, save a tree! Because all good movements should end up saving trees.
Promote true freedom - support standards and interoperability.
Christ.. thats just silly. What ever happened to people being efficient?
How much legacy garbage is still floating around in there?
---- Booth was a patriot ----
Why? I have not compiled my own kernel for about 2 years now without any problem.
Ubuntu provides a nice generic kernel which is suitable for 99% of desktop workloads.
Personally, I prefer downwards of women, but if only upwards are available, that's fine, too.
Oh because you think that 'most user friendly OS' - OSX doesn't need a recompile for iPhone vs laptop use ?
... well that's part of the job description.
And did only having one architecture hinder Windows from taking over the world (yes I know it supports 2 variants of the same now) ?
Besides 99% of linux _desktop_ users do not _need_ to compile anything. I certainly have never had to do so for regular use.
Now for an admin
Linux cannot seriously be called "monolithic".
Well it can't be called a micro-kernel. And the notion of a "hybrid-kernel" is a joke. It's squarely in monolithic town.
the real number of pure code lines: 6.399.191, with 96.4% of them developed in C, and 3.3% using assembler.
Personally I thought the news was that no one knows what 0.3% of the linux kernel is written in. THAT'S news! (I'm betting it's BASIC).
10. Embossed, signed paper Certification of Live Birth -- Not Released
Are you trying to imply that he was born dead, and is some kind of zombie hitherto unknown to man? That would certainly explain why he hasn't released his medical records.
Of course, not being a complete raving loony would also be a fair reason for not releasing your medical records. Has your expectation of the privacy celebrities should be afforded sunk so low?
Back in the day when Windows NT first reached over 3.5 million lines of code this was used by Linux fanbois as proof positive that NT was a bad operating system. So I guess this means that Linux is now three times worse.
It's COBOL, that crap is still just everywhere.
In Capitalist America, bank robs you!
"If we wish to count lines of code, we should not regard them as lines produced but as lines spent." --Edsger Dijkstra
So, in fact, it's not 10 million lines of code at all. It's just 10 million lines. Wooooo.
I would have guessed it to be all the Whitespace in the C and assembly. That's a language that can really brainfuck you. A real gem, a perl I tell you ;)
Why is this news? Single commercial applications can have more than 10 million if they're complex enough. I would EXPECT the Linux kernel to have this many lines of code (or more) given what it does and how long it's been in development.
Homonyms are fun!
You're driving your car, but they're riding their bikes there.
1) It is hard. Most programmers can't beat out the compilers auto-optimisations.
2) A lot of low level hackery to optimise your code, often makes it unportable and unmaintainable.
3) Hardware is massively cheaper than programmers hours. This is the whole point of languages like Python.
4) Most of the RAM bloat in modern systems is for the fancy GUI. Reducing the RAM usage in that case normally means increasing the CPU load. RAM is easier to scale then CPU.
========
CINC, 4th Penguin Legion
Bloatware.
+++OK ATH
I care because up at the top of this website is the tagline "Stuff that matters."
This is the biggest non-headline I've ever seen.
DRM: Terminator crops for your mind!
/* 3k lines of workaround for 8 lines of code. WTF were they thinking? */
//This might work.
//Blocks undocumented interface used only by WordPerfect.
//Passes test. Ship it. I'm done. <Allchin>
Help stamp out iliturcy.
Personally I thought the news was that no one knows what 0.3% of the linux kernel is written in.
Most likely the missing 0.3% is makefiles and miscellaneous scripts.
Is get a computer to do stuff like the first world does, with a third world power infrastructure.
Hint: They get their electricity from carbon fuels. We don't want them to build out their power infrastructure because we're fond of our Gulf Stream.
Capiche?
Help stamp out iliturcy.
The PHB problem is that they must have metrics to measure so they can list their datapoints in Excel and turn it into a nice Powerpoint slide.
I am SO glad Linux is invulnerable to this sort of attack.
Help stamp out iliturcy.
As we seek the common market it's best not to mention this whole "compiling" thing. Sure, people can do it, but they don't have to and it has the market appeal of Vista. It's best to leave references of it to howto pages and geek sites where optimization is key and proficiency is assumed. Slashdot is not quite that, yet.
Help stamp out iliturcy.
The parts of the kernel that could be converted to APL would reduce its size by 90%.
OTOH, APL bests perl in "write only" language contests.
Help stamp out iliturcy.
Ok, so those who can't handle the truth suppress the opinions of those who can handle the truth and a better technology. I guess the lemmings jump off the monolithic cliff together... I'll go the other way... all the best with your choice...
Hint: They get their electricity from carbon fuels.
We also get most of our power from carbon fuels...
Not to mention that there are plenty of extremely low-power computers that have respectable performance numbers...
To be fair, you don't need to compile a custom NT kernel, you only need to write new drivers, of which the VxD portion will be loaded into the kernel space. You would only need to update the NT kernel if you wanted to change core things like how the virtual memory system works, but then you would have to change the Windows API...I think it's best if that task is left to Microsoft (who, despite the mocking from /., they have bring out a very decent kernel...the NT kernel has not been reported to crash (not the drivers, the kernel!) for a very long time now).
After extensive experience of looking after Windows, I've decided that all NT kernels are custom kernels - you just don't know in what manner it's been customised, or by whom.
"It doesn't cost enough, and it makes too much sense."
Isn't most of it just drivers?
In which case it's hardly exciting as it could triple in size and the actual kernal features be exactly the same.
For every expert, there is an equal and opposite expert. - Arthur C. Clarke
> Linux has already had a near miss back in 2003
I am not sure you could call it a "near miss". It was detected automatically during a routine check (you would not say a aircraft "nearly crashed" if a fuel pipe leaked during a pressure test in the maintenance hanger ?).
the proof that Linux is a bloated piece of shit
The Commodore 64 Kernal* was 8K and lived in ROM... and we LIKED it!**
*Yes, Commodore really spelled it that way.
**We liked it because the C64 booted in about 2 seconds! :)
Serving your airship needs since 1995.
I did not say that Windows NT microkernel was about 6 million LoC, but the Windows NT OS. Windows NT OS use microkernel, so the OS is the kernel + OS servers in userland. The Windows NT microkernel might have 50 000 - 200 000 LoC while the whole Windows XP Software System has over 40 million LoC
50 000 - 200 000 Microkernel LoC
~6 million OS LoC
40 million Software System LoC
Linux OS has now over 6 million lines, it includes all drivers etc. The kernel itself, without any drivers etc, might be lots of smaller itself. But the whole Linux OS distribution like Fedora, has over 200 million LoC
~6 million OS(monolith kernel) LoC
150 - 250 Million Software System LoC
And Linux is cery seriously a Monolith, but it is modular and not macrokernel.
I want my apps to run in a sandbox, where they cant break jack.. no matter how malicious they are. I want the box to be absurdly robust. And I want it to work as smooth as silk, no weird jerkyness, no pinwheel that keeps spinning for no apparent reason.. no slow response to the keyboard-- where 8 seconds go by without so much as the slightest feedback.
Get me a system that isnt a dog, and I'll pay for the 32gigs of ram. It will be money well spent.
Storm