Linux Kernel Surpasses 10 Million Lines of Code

← Back to Stories (view on slashdot.org)

Linux Kernel Surpasses 10 Million Lines of Code

Posted by timothy on Wednesday October 22, 2008 @05:32AM from the nice-round-figures dept.

javipas writes "A simple analysis of the most updated version (a Git checkout) of the Linux kernel reveals that the number of lines of all its source code surpasses 10 million, but attention: this number includes blank lines, comments, and text files. With a deeper analysis thanks to the SLOCCount tool, you can get the real number of pure code lines: 6.399.191, with 96.4% of them developed in C, and 3.3% using assembler. The number grows clearly with each new version of the kernel, that seems to be launched each 90 days approximately."

21 of 432 comments (clear)

Min score:

Reason:

Sort:

Lines of Code by Flyin+Fungi · 2008-10-22 05:36 · Score: 1, Insightful

Lines of code is not a good metric for performance. I'm in a software engineering class listening to how to use metrics on code.
1. Re:Lines of Code by Anonymous Coward · 2008-10-22 05:47 · Score: 1, Insightful
  
  The amount of code doesn't always correlate to the size of the final binary. You have to consider a slew of things when considering the Linux Kernel. First of all there is a lot of architecture specific code in there since Linux can run on everything from ARM chips to Sparc machines. Also you have to consider the built in drivers that are included in the source but aren't usually compiled with the kernel binary unless you're running an embedded or specialized system. If you have ever set up building the Linux kernel anyone would see there are a giant combination of things that a person could add and remove. The Kernel size getting larger just reflects more improvements and support for a wide range of machines. The final binary of a typical kernel has grown in size over the years but not at the rate of the lines of code so I wouldn't call Linux bloated because of the shear size of the code base.
2. Re:Lines of Code by hondo77 · 2008-10-22 06:11 · Score: 5, Insightful
  
  Why? Are you still using an 80s-era Mac as your primary computer?
  
  --
  I live ze unknown. I love ze unknown. I am ze unknown.
3. Re:Lines of Code by QRDeNameland · 2008-10-22 06:14 · Score: 5, Insightful
  
  If 1 Line of Code = 1 Library of Congress, you should acquaint yourself with the Enter key.
  
  --
  Momentarily, the need for the construction of new light will no longer exist.
4. Re:Lines of Code by Anonymous Coward · 2008-10-22 06:21 · Score: 1, Insightful
  
  If you were a kernel developer, you'd know that yes, people are consolidating code all the time, to reduce LoC.
  Dropping old drivers happens, too, but at a much more sedate pace, since unlike Winblows, Linux still supports 20-year-old devices.
  But the thing is modular, so chances are you are never compiling more than half those LoCs, and usually a lot less than that...
5. Re:Lines of Code by Anonymous Coward · 2008-10-22 08:07 · Score: 1, Insightful
  
  maybe somebody should be working to pare it down some?
  Personally, I'd much rather have a functional OS that, for instance, have drivers for whatever thing I connect to it.
6. Re:Lines of Code by Anonymous Coward · 2008-10-22 09:04 · Score: 1, Insightful
  
  I'm probably going to be marked a troll but...
  when did efficiency become outdated? Not every system is for the home PC either.
7. Re:Lines of Code by Just+Some+Guy · 2008-10-23 03:47 · Score: 2, Insightful
  
  No but a modern PC running windows uses 1000 times more RAM than GEOS Commodore 64, but doesn't really do anything extra. The OS needs to go on a diet.
  GEOS supported thousands of printers, hundreds of hard drive adapters, hundreds of video cards, streaming network video, 3d gaming, virtual memory, several CPU vendors, hundreds of mice, and all that in 20KB of memory? Impressive!
  Less sarcastic answer: modern computers do a whole awful lot more than GEOS did.
  
  --
  Dewey, what part of this looks like authorities should be involved?
What did sloccount say the kernel was worth? by OrangeTide · 2008-10-22 05:38 · Score: 2, Insightful

Because we'd all like to know how many man-months something a big as the linux kernel should take to implement. And laugh at the huge price tag sloccount will put on it.

--
“Common sense is not so common.” — Voltaire
Re:Micro-kernel vs massive kernel? by pembo13 · 2008-10-22 05:56 · Score: 2, Insightful

I think they are including modules as well. And there are a growing number of userland drivers as well. So you can't come to a conclusion without knowing the size of the parts outside the kernel.

--
"Thanks for all the money you paid to us. We've used it to buy off ISO among other things" -Microsoft
Lines of code as a metric by qoncept · 2008-10-22 06:01 · Score: 4, Insightful

Funny that the summary calls attention to the fact that the number of lines includes comments and whitespace without any mention of how worthless lines of code is as a metric. Someone could easily go in and add or remove newlines wherever they wanted and without changed a bit of code make it 50 million or 50 thousand.

--
Whale
Re:What about the other .3% ? by glavenoid · 2008-10-22 06:02 · Score: 3, Insightful

Makefiles, build scripts, etc., perhaps?

--
I, for one, am looking forward to the inevitable /. beta rollout fallout.
Re:Line Count Not Always a Good Thing? by Microlith · 2008-10-22 06:08 · Score: 4, Insightful

While Linux is huge, for a backdoor to be successful it would need to hit a huge number of systems. The majority of the kernel at this point tends to be drivers, not all of which are used in a given kernel.
For it to be even remotely worthwhile, it'd have to be placed into something that was both heavily used AND given little attention. These two positions are almost mutually exclusive.
Can anyone think of a place that would fall into these two categories? Even the more seemingly obscure parts of the kernel get attention fairly often and malicious changes wouldn't go unnoticed for long.
Re:Reply from actual kernel developer please . . . by earlymon · 2008-10-22 06:34 · Score: 5, Insightful

I'm a developer and was wondering what kind of testing is done to verify the code.
Guinea pigs. Millions of us.

--
Pathological kinda promises Path + Logical - but instead, you get stuck with pathetic.
gratutitous complexification by cthulhuology · 2008-10-22 06:38 · Score: 2, Insightful

This only proves that the Linux Kernel is in need of a significant refactoring effort. The capacity for any single developer to understand or even read a significant portion of this code is NIL. As a result, the opportunity to reduce duplication of effort is quickly diminishing, and the ability of new users to contribute anything other than additional bloat is similarly diminishing. And while the core of the kernel may be "small", and much of this code is dealing with special cases for specific hardware, because of the size of the code involved it is increasingly difficult to identify what is substantial and what is merely stylistic differences between two drivers. Increasing LOC counts is a sure sign of under analysis and over reliance on the availability of cheap labor. You can pick any arbitrary number of lines of code (less than say 20k) and pick that as the number of lines the kernel should occupy. As an individual line may define a new abstraction, LOC represent a potential for a geometric increase in complexity. So either these 6-10 million lines of code represent some truly staggering level of irreducible complexity (most unlikely), or are merely the result of not refactoring the code sufficiently (most likely). This really is a milestone in gratuitous complexification that should be morned, not hailed.
1. Re:gratutitous complexification by Anonymous Coward · 2008-10-22 07:56 · Score: 1, Insightful
  
  This almost makes one think that you take LOC as an indicator of complexity, which is simply ridiculous. If you consider that the majority of that code tends to be in drivers and architecture code, the complexity argument goes out the window. Specific drivers and architectures are only of interest to certain people, and the vast majority of users are never going to interact with anything outside of their narrow window. While it is true that no one really understands the kernel in its entirety, it has been that way for well over a decade or so, yet things still seem to be making forward progress somehow. There is more to be said for having a strict hierarchy of subsystem maintainers and the associated trust metric for merging up, but this is the fundamental methodology that permits the system to scale so effectively.
  The one thing your rant also excludes is that there is no need for someone to understand the entire system, even at the core level. The kernel is much more of a social atmosphere built around trust and interpersonal interaction. When various VM issues are encountered, all of the usual folks working in that area are CCed and left to work it out, etc, etc. It is much more an issue of knowing who to defer to in order to see results than it is someone at the top having a hand in everything. Subsystem maintainers are usually in their positions because they understand their problem space, and work in it on a daily basis. Trying to work around them or displacing that undermines the entire process.
  The barrier for entry has gone up over time, but drivers now (where people usually start) are still a well documented area and one where a lot of resources and help exists to get one going. It is also arguable that writing a driver for the current kernel is orders of magnitude simpler to what it was in the pre-2.6 days prior to the introduction of the driver model, where interfacing was much more ad-hoc. While there is a lot of inherent complexity in the driver model, the vast majority of that is stuff that a driver developer simply doesn't need to care about.
  Rather than whining about LOC, perhaps you can point to a few specific cases of why you believe the current system is error prone, since it seems to be working just fine for the rest of us.
"Actual" code? by TuringTest · 2008-10-22 06:51 · Score: 4, Insightful

Comments are also code.
If you only count as code what can be feed to the machine, you should look at the size of the compiled binary. Source code is meant to be read by *humans*, so comments do count. That's why the GPL requires them to be left in the files (the "preferred form" to edit), otherwise it wouldn't be source code.

--
Singularity: a belief in the "God" idea with the "demiurge" relation inverted.
1. Re:"Actual" code? by bonch · 2008-10-22 09:46 · Score: 2, Insightful
  
  I don't really care much about theoretical programming paradigms. "Code" refers to the instruction statements written in a programming language for a compiler to interpret, not the comments written off to the side that the compiler ignores.
Re:Isn't that normal? by Abreu · 2008-10-22 07:25 · Score: 2, Insightful

...still, we should think about adding Asimov's three laws before we reach such an event horizon, no?

--
No sig for the moment.
Re:Isn't that normal? by RAMMS+EIN · 2008-10-22 07:49 · Score: 2, Insightful

``Unfortunately, as you approach the limit, the performance must drop as you've now abstracted so far that your code becomes essentially a virtual machine on which your data runs.''
I don't see that. Not all abstraction makes things slower. In many cases, abstraction lets you write code at a higher level, while still compiling down to the code you would have written if working at a lower level.

--
Please correct me if I got my facts wrong.
Re:Function Point Analysis by DrVxD · 2008-10-22 08:29 · Score: 2, Insightful

i believe a more appropriate measure of the 'bloat' (i.e. useless functions) or the size of any software package is through function point analysis
I recall many years ago, a PHB (this is long enough ago that nobody called them that yet) was talking about developer productivity metrics; he announced that the powers that be were considering either KLoC or Function Points. The guy sitting next to me said "I have no idea what function points are, but they've got to be better than KLoC". The remark made one of those wonderful whooshing sounds as it sailed straight over the PHB's head...

LOC is without question one of the easiest measurement (aside from total package size in bytes, which is nearly as uninformative)
+1 - Fundamental Law Of Physics.
LoC's only redeeming feature as a metric of anything is that it's (relatively) easy to measure. Of course, there's the debate about "do we count comments", "do we count whitespace", "how do we count curly braces" - so it turns out it's actually NOT all that easy to measure. But don't let a PHB hear you speaking such heresy...

--
Not everything that can be measured matters; Not everything that matters can be measured.