What Actually Makes Up "Linux"?
David A. Wheeler sent in linkage to his extensive analysis of the true
size of Linux. There's an amazing amount of information in here, and although it focuses on Red Hat 7.1, it still has tons of interesting bits of information about the code that makes up the distribution. Break downs include languages, licenses, cost estimates, and stats that in no way clear up the legendary GNU/Linux debate that will undoubtably be engraved on tombstones somewhere.
Several hundred utilities.
And three hundred and fifty thousand annoying slashdot trolls.
--
"Remember when the U.S. had a drug problem, and then we declared a War On Drugs, and now you can't buy drugs anymore?"
Windows is made up of the following:
You can moderate this down, but I challenge you to find proof that this situation is otherwise.
And that's why the system is GNU/Linux, and not 'Linux', which merely refers to the kernel.
A lot of the code they're listing as "Linux" code isn't GNU code at all. It's released under the BSD license (e.g. Apache). It's released under the Artistic license (e.g. Perl). Calling the system GNU/Linux simply because it has some GNU tools on it is like me calling my Windows box Netscape Windows because I have an old version of Navigator on it or GNU/Windows because I have GNU apps on it.
I think the reason people are more apt to further describe Linux as GNU/Linux is not because it uses GNU apps, but because it is released under the GNU Public License.
What's being asked here seems to me to be simply: "We know that a kernel isn't an operating system. So what is 'linux'?"
The difference is the GNU System and the utilities that were built up beside the linux kernel and supporting it. The difference between linux the kernel and linux the system that we all know and love is the GNU System.
And that's why the system is GNU/Linux, and not 'Linux', which merely refers to the kernel.
-- Truth goes out the door when rumor comes innuendo. -- Groucho Marx
This was a quite thorough, well-written document all until the point he mentioned Bill Gates. Well, actually not Bill Gates himself but the immortalised words from his "Open Letter To Hobbyists".
In particular, the bit about documentation. The thing that Linux lacks these days is decent documentation in alot of areas, in particular things like devfs (which the author even admits is now poorly documents (the instructions that are available are now out of date)).
Coming from a BSD background (no, this isn't an excuse for a platform war - just hear me out), documentation is just as important as the code itself. This sometimes means that availability of certain features in BSD are a generation behind that of Linux, but when they arrive, the documentation is top notch, containing correct spelling and grammar, notes what bugs are present, provides examples of correct usage (this is especially relevent in documenting programming functions whose incorrect usage may have a security impact) and so on. Overall, it's an issue of documentation quality.
The author of the paper may scoff in the direction of Bill Gates, noting the ability of the Linux community to create and maintain an operating system, but what he's done in the process is brought the whole paper down by exposing the single thing that Linux as a "disparate sources, one distribution" model operating system can never have as what Microsoft products and, from my perspective, the BSD operating systems have - documentation that exists in a single form and written in a style that is consistent across the entire operating system. (This is not the case with Linux. Some things use manpages, others use "info", others use textfiles, others use html documentation. Heaven knows how a new user on Linux (advocacy is about attracting new users, right?) is supposed to navigate this mess without a considerable level of pain and/or persistence).
And before you let the flames begin, have a poke around on say, the NetBSD/OpenBSD/FreeBSD sites' manual page listings on their website and compare them to the ones you see on RedHat and so on.
Linux is what is keeping me from meeting women. : (
...i know this, and still, I find myself compulsively rebuilding my kernel.
Know what I like about atheists? I've yet to meet one that believes God is on their side.
2437470 source lines of code for the Linux kernel. Doesn't that worry some people out there? We have a monolithic kernel almost two and a half million lines long. I think that by 2.6 the kernel is going to collapse under its own weight unless the designers decide to reorganize it in a fundamental way. Maybe it's time for a Linux-Hurd fusion project that will turn Linux into a true microkernel.
Qu'on me donne six lignes écrites de la main du plus honnête homme, j'y trouverai de quoi le faire pendre.
That advance certainly didn't come in 1991, because the Amiga's clipboard.device already could do that earlier (at the latest, 1990 when 2.0 was released, and probably much earlier in the 1.x days of the 1980s but I'm not 100% sure). And this sort of thing wasn't really what the Amiga was famous for, so (I am speculating) that idea may have been stolen from the Mac.
Linux has faster filesystems. But Linux and NT both still suck at that, so I guess I should mention something more substantial:
An area where Linux is way ahead of Windows would be extensibility.
For example, Linux and Windows, when running on x86, both have a severe problem where code can be executed on the stack. If you run a network service and it has a buffer overflow bug, then bad people on the Internet can write their own code and execute it on your machine. So some guys decided it wasn't such a good idea for that to be possible, and they released some kernel patches to make it so that this infiltration technique doesn't work.
This actually reveals two ways that Linux is further ahead than Windows.
That doesn't put Linux just a few years ahead of Windows. It puts Linux a whole generation ahead of Windows, and even my beloved (but no longer maintained) AmigaOS. Freeness itself is a huge feature. (Alas, it's about all that Linux really has. But it's a biggie!)
---
As copyright owner of this comment, I authorize everyone to defeat any technological measure which limits access to it.
Okay, now take a set of 15,000 web pages under a web server on Windows. Replace all "A" tags that refer to "url1" with "A" tags that refer to "url2" in all 15,000 pages.
How easy is this in Windows? I can do this with one command line in Linux (and any other *nix for that matter). Yes, I have to know REs to do it. Yes, it took me several hours to learn the RE expression syntax and several weeks of using them to make them second nature to me, but now I can do tasks like these in a matter of minutes. With ANY Windows system this would take several weeks.
"Useability" is a slippery term. Also, while Microsoft products do meet a certain level of minimum useability, there is a equal amount of crappy software from third-parties out there that are every bit as "unusable" as the hobbyist stuff for Linux.
And just how "useable" is, for example, MS Office? Sure enough, retarted monkeys can do the basics, but I would bet you 2:1 that 90% of Word users only acheieve 40% code coverage of Word -- in other words, if you start digging into everything Word is every bit as obtuse and difficult as state of the art 1970 glass teletype editors. More difficult, I would argue, because you could learn everything there was to know about those "unusable" editors in about two hours. Of course, you couldn't make a marketing brochure with those editors (unless you wanted to go out of business), but my point is that "useability" is pretty danged meaningless. "Suitability" is more to the point. Word is lousy if you want to do accounting.
Microsoft has actually substantially held back the increasing useability of systems by kepping the PC the dominant platform. Most people do not need general purpose computing devices. Home users need an "appliance" that does Web, e-mail, instant messaging, personal finance, word processing and maybe a spreadsheet. Business users need that plus presentation software, calendar/scheduling etc. These devices could be "embedded" type devices (think the Palm metaphor) that are much easier to use than PCs. Why should ANYONE but the very few who need more need to know about clock speeds, RAM size, ISA/EISA/PCI, irqs, USB, etc.?
The claim that Microsoft has advanced useability is absurd. They have been struggling against their own monopoly platform for over a decade, not because of their own failure, but because of the inappropriate design of the platform for its present use.
I will certainly grant that one must know a lot more to make good use of Linux on PC than to make good use of Windows on a PC. But which is easier to use, a Palm Pilot or a Windows PC? A TiVo or a windows PC? A Nintendo or a Windows PC?
Useability my rotund fundament!
Oh yes, that one-line command to edit 15,000 web pages?
/server/base/dir -type d -exec perl -e 's/url1/url2/gi' -p -i.bak *.html \;
$ find
I used to do it with a find within a find and a sed command, but the perl trick is a very nice shortcut, esp. since it edits the files in place and leaves backups behind!
Linux progressed farther in 10 years than Microsoft during that same time frame
I don't see how that's true at all. In both technology, and the bottom line, Microsoft is *years* ahead. Technology: let me offer one example: go to a web page (IE) with some kind of table with data in it. Copy the table. Paste it into Word. It actually becomes a Word table! Paste it into Excel. It actually places the data, and the formatting, into the cells! How far is linux from that level of ease of use, that level of "object linking and embedding" across apps? Do you think the multiple desktop standards helps or hinders this task?
And in terms of bottom line, linux companies are still trying to figure out how to make a buck. Redhat just now moved into the positive column, VA and others layoff people seemingly every week.
I'm a fan of Linux because I'm a hacker. I like the shell, I like the flexibility and customisability that come with having dozens of "glue" tools. But the fact is, hackers are the minority of computer users, and this is only going to be more and more true in the future. For the masses, ease of use is priority 1, and it seems, at least to me, that the "other" platform has a great lead in that arena.
---
python -c "x='python -c %sx=%s; print x%%(chr(34),repr(x),chr(34))%s'; print x%(chr(34),repr(x),chr(34))"
After reading the analysis, two things sprang out at me. The first is that a lot of the stuff on a Linux system is meant for development, rather than just using the system. The second is that lots of the stuff on the list clearly is "application" and not anyone's idea of an "operating system".
Specifically, in the top ten, we have:
Development Tools
Applications
(Also in the top 20 are libgcj, teTeX, postgresql, and xemacs. And we won't get into the issue of whether Mozilla (#2) should be considered part of the operating system.)
So my question is, what's the size of the non-development/non-application stuff? What's the size of the kernel plus the essential utilities (most of which are GNU, as RMS points out ad nauseum)?
Happy Premise #3: Even though I feel like I might ignite, I probably won't.
It would be interesting to see how many MB of space those "This is GPL" disclaimers take up.
... is the people. Seriously, Windows can't really say that because there is no real "Windows community". Mac people can talk about it, but they are still dependant on Apple for all wants and needs. On the other hand, Linux is written, used, and supported by the people themselves. Those figures, all of it from the the lines of code to the language percentages, just illustrate who and what we are as a community.
It's something I could go on and on forever about because it really is something special in a world dominated by the shadow of Gates and Jobs. "Those people" who work "over there" don't make this. We do! While all those numbers can start to quantify this, you can't really put a dollar value on it the same way you can't put a dollar value on freedom. Funny thing to be able to say that about a bunch of software...
"I may not have morals, but I have standards."
"I may not have morals, but I have standards."
---
Linux is in its newest incarnation ~25mg of tared and g/b2/zip'ed source code written in C and covered by the GPL. Without gcc or some other compiler you can't even compile it. Without a shell you can't do much with it. All of those things come from the GNU or other sources.
Linux is in its simplest form much like a Japanese car built with 87% United States parts.
On a personal note:
In the beginning there was Linus and the word was with Linus. Accept Linux into your hart and you shall have uptime eternal.
Kernal 3:16:
For Linus loved man so much that he gave his first begotten OS.
Ascii artist &
What we need to measure is LOD: Lines of Documentation. We measure that against SLOC (Source Lines of Code) and we would learn that Linux is, by any rational account, very poorly documented. And, compared to (more-or-less) intuitive full GUI environments, Linux really needs documentation. GOOD documentation.
Which might help explain another number that keeps cropping up: 5% of the OS market.
> Using RedHat as a distro for this project isn't that good of an idea.... it's just an unrepresentative mass of programs and code! I can safely say that most Redhat users will never use about one-quarter of the programs in their distribution...
That's true for any of today's operating systems. No user uses all the code in Windows, either. Even real-time OS's have more code developed for them than is used by any given user. As a measure of effort, though, examining all the code makes sense.
> Since when is the number of lines of code proportional to the quality of the software? If Red Hat 7.1 has 30 million lines of code over 6.2's 17 million, does that mean the product is 76% better? Is the code getting more sloppy as more programmers get involved? I feel like counsel is leading the witness for the author to say 7.1 has "60% more effort" under the COCOMO model."
I never said it was "better", I said it included "60% more effort." Better is a value judgement. Effort is measured in person-years.
> The kernel shouldn't be two million lines of code. How much of that is drivers? And how much of the drivers are duplicated from one driver to another?
Section 3.2 specifically discusses this; 57% of the lines of code are drivers. Duplicate files are only counted once, but "partly duplicated" files are much harder to detect (and to discount when they happen); they certainly happen in the Linux kernel. However, the COCOMO model is based on real project data, and many other projects include cut-and-pasted code (for good or ill).
> Ok, so this guy claims that Linux would cost a little over $1 billion (US) to develop. I wonder what the big deal is. I'm sure Microsoft has spent that much over the years on Office+Win9x+WinNT+Backoffice+etc ... The only thing incredible about this number is that most of that billion was completely unpaid, or at least underpayed.
But I believe that is a big deal. Gates' "Open Letter to Hobbyists" assumed that if people just shared code, no large project would be developed. GNU/Linux and other open source/free software systems show the assumption wrong, and this paper has the numbers to prove it. You can argue which is "better", of course, but the notion that it can't be done is no longer debatable.
> Are there estimate[s of] how much money in form of salaries were ever paid to programmers for the code and how much was in effect done not only voluntarily, but also completely on an unpaid basis?
Unfortunately not; it's not even clear how to find out. You would have to go back to individual patches submitted to every project, and few people identify in their patches "I was paid to do this."
> 2437470 source lines of code for the Linux kernel. Doesn't that worry some people out there? We have a monolithic kernel almost two and a half million lines long. I think that by 2.6 the kernel is going to collapse under its own weight unless the designers decide to reorganize it in a fundamental way.
It's the nature of a monolithic kernel, and in any case, most of that is in modules (which are individually much smaller and only loaded when needed). I see no evidence of a "collapse", though clearly there are competitors (like HURD) that might eventually replace it in the market.
> Quoting statistics/data going back to '95 is way out of date by todays standards, even '99 is now very old.
It may be old, but it helps give perspective. A simple SLOC number doesn't mean much to people, unless it's compared to something else.
> The cost formula includes a term (ksloc**1.05): i.e. thousands of source lines to the power of 1.05. This reflects the fact that the bigger a program becomes, the harder it is to add new lines, because the system you are adding too is more complex. He plugs the size of the entire code base of RH7.2 into this formula. This seems unreasonable to me - these are many almost independent packages.
No, I don't do that (for the reason you cite). Section 2.3 of the paper discusses this: "Each build directory had its effort estimation computed separately; the efforts of each were then totalled." Appendix A mentions that sloccount was given the "--multiproject" option, which implements this.
Anyway, I hope people found this study interesting. It sounds like several people did.
- David A. Wheeler (see my Secure Programming HOWTO)
How about solving this by creating a fanciful glyph (vaguely 'L' shaped) and allocating a point in the Unicode codespace to replace the name? There would no longer be a spoken name for /The Operating System Formerly Known as (GNU\/)?Linux/.
The Glyph could mean all things to all people. Everyone would be happy enough to resume productive activities.
-- Dr. Mike
There is nothing revolutionary there.
Frankly, show me one usefull feature on RH distribution that hasn't been done before ?
...and you can't blame meteors for everything.
...is caffeine. Lots and lots of caffeine. I don't care if you're a programmer, a system administrator, or a homebrew hacker (in the old and true sense of the word). Without the readily available supply of that wonderful drug called caffeine, who would say that Linux would be even 1/4 the phenomenon that it is today? Hmm?
Am I the only person who cringes every time I read "x-windows?"
Or have they officially changed the name? (might as well...)
--
--
"You know you want me, baby." --Crow