Linux Kernel Surpasses 10 Million Lines of Code
javipas writes "A simple analysis of the most updated version (a Git checkout) of the Linux kernel reveals that the number of lines of all its source code surpasses 10 million, but attention: this number includes blank lines, comments, and text files. With a deeper analysis thanks to the SLOCCount tool, you can get the real number of pure code lines: 6.399.191, with 96.4% of them developed in C, and 3.3% using assembler. The number grows clearly with each new version of the kernel, that seems to be launched each 90 days approximately."
It's significantly easier to hide a malicious backdoor inside a huge software project than a small one. Linux has already had a near miss back in 2003, when the CVS repository was compromised. Considering how many mission-critical applications run under Linux, there's a huge financial incentive to hide a backdoor somewhere in those 10 million lines.
Yes, but it can go down with optimizations and refactoring (finding duplicated code and pushing it into a function or macro, for example) and with eliminating dead code. Ideally, code size should be asymptotic to an optimal size. As you approach the optimal size, more and more of what you need to do is already available to you. As you approach the limit, the amount of special-case logic and hardcoding approaches zero, and the amount of data-driven logic approaches 100%. Unfortunately, as you approach the limit, the performance must drop as you've now abstracted so far that your code becomes essentially a virtual machine on which your data runs. Simulating a computer is always going to be slower than actually using the real computer directly. In most cases, this is considered "acceptable" because your virtual machine is simply too advanced for any physical hardware to support at this time. (There is also the consideration of code changes, but as you approach the limit, your changes will largely be to the data and not to the codebase. At the limit, you will change the codebase only when changing the hardware, so if you could hardwire the code, it would not impact maintenance at all. All the maintenance you could want to do would be at the data level, given this level of abstraction.)
Linux is clearly nowhere near the point of being that abstract, although some components are probably getting close. It would be interesting to see, even if it could only be done by simulation, what would happen if you moved Linux' VMM into an enlarged MMU, or what would happen if an intelligent hard drive supported Linux' current filesystem selection and parts of the VFS layer. Not as software running on a CPU, but as actual hard-wired logic. Software is just a simulation of wiring, so you can logically always reverse the process. Given that Linux has a decent chunk of the server market, and the server market is less concerned with cost as it is with high performance, high reliability and minimal physical space, it is possible (unlikely but possible) that there will eventually be lines of servers that use chips specially designed to accelerate Linux by this method.
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)