Domain: dwheeler.com
Stories and comments across the archive that link to dwheeler.com.
Comments · 467
-
Product placementMS was to computers what Big Tobacco was to sports. If you didn't get in the pyramid by now, it's too late, forget it. It's over - especially now that Greenspan has said his. Too much attention is being spent on the antics of a dead company.
Slashdot's product placement and trolling stepped up while European legislators were discussing software patents. Picayune articles, many of which consisted of rehashed softer versions of old FUD and misinformation, covered topics which have already been dealt with, again and again.
Since most novices do not understand the scope and severity of MS's problems and since any critique of MS, no matter the merit, gets written off as "MS-Bashing", it would be best to focus on the more successful areas of the IT sector. Here are a few examples:
Check the forums for tools that work - *BSD, Linux, QNX, Netware, eDirectory, LDAP, Kerberos, KDE, Gnome, Apache, MySQL, Postgresql, and so on
... -
Re:With an 84% profit on each copy sold...
I have no idea what Linux, in general, costs if you factored in time, bandiwth, etc but I would not be suprised to see it in the tens of millions.
Oh, much more than that! See this article for details...
Al. -
Re:Should the GPL be used to legitimize theft?
Who is the publisher and who is the theif?
SCO is the publisher. IBM is the alleged thief.
Did the thief publish under such a license? Oh well.
I can understand arguing that the publisher, the copyright holder, who published under the GPL gets what he deserves, but are you arguing that the thief's publication is legitimate? If so, some rogue employee would have GPL'd Windows by now. (Also, the GPL does not allow this behavior; it must be licensed by the copyright holder.)
The primary cause of SCO's wierd predicament really isn't the GPL.
But it is. Because what I'm hearing from this community is, "so what if IBM stole their code and inserted it somewhere in GNU/Linux; they mistakenly published it under the GPL, so they forfeit their code and copyright." This has nothing to do with trade secrets. This is a full on attempt to bring the GPL into a court case SCO wants to lose, because the win will have a chilling effect on businesses that want to use the GPL.
It's easy to [prevent] the GPL from gobbling up your IP [like] pack-man: just don't release anything under the GPL or, if you do, make damn sure that you know what it is that you're releasing.
You make it sound so easy. You forgot to add, "Ensure that no unscrupulous third-party has inserted your code anywhere in GNU/Linux." Unfortunately, GNU/Linux is 30 million lines of source code (Wheeler 1). -
For stats, see "Why OSS/FS? Look at the Numbers!"
For statistics about open source software / Free Software, see my paper, "Why Open Source Software / Free Software? Look at the Numbers!", at http://www.dwheeler.com/oss_fs_why.html . It has a large collection of information you'll probably find useful.
-
Kingpins not enough. Guarded email, etc.Attacking the kingpins will probably have a very nice short-term effect. But will it really help long term? I doubt it. Instead, there will be new kingpins in countries outside their control, perhaps in places where it's still legal to crack into other computers. Also, there will be a gradual increase in spam from the large number of other spammers.We need techniques that work long-term.
If you're interested in countering spam, please check these out:
- http://www.dwheeler.com/essays/stopspam.html - essay about techniques to stop spam
- http://www.dwheeler.com/guarded-email - a paper about Guarded Email, a challenge-response system that might really help.
-
Kingpins not enough. Guarded email, etc.Attacking the kingpins will probably have a very nice short-term effect. But will it really help long term? I doubt it. Instead, there will be new kingpins in countries outside their control, perhaps in places where it's still legal to crack into other computers. Also, there will be a gradual increase in spam from the large number of other spammers.We need techniques that work long-term.
If you're interested in countering spam, please check these out:
- http://www.dwheeler.com/essays/stopspam.html - essay about techniques to stop spam
- http://www.dwheeler.com/guarded-email - a paper about Guarded Email, a challenge-response system that might really help.
-
Opt-out lists, guarded emailIf you hate spam (I do), you might find these interesting:
http://www.dwheeler.com/essays/stopspam.html: An essay about stopping spam. Although I think opt-out lists are a poor solution, they can be made to work - but they have to be run by someone without a conflict of interest (not true here!), and in a way that doesn't increase spamming (e.g., just store hashes, not the email addresses themselves). Make the spammers pay for the opt-out list upkeep. Most importantly, it has to be supported by law, not by lame "self-regulation".
http://www.dwheeler.com/guarded-email: A paper about Guarded Email, a particular challenge-response approach. Unlike heuristic approaches, these approaches kill off the attack / counter-attack cycle we're stuck in.
Enjoy!
-
Opt-out lists, guarded emailIf you hate spam (I do), you might find these interesting:
http://www.dwheeler.com/essays/stopspam.html: An essay about stopping spam. Although I think opt-out lists are a poor solution, they can be made to work - but they have to be run by someone without a conflict of interest (not true here!), and in a way that doesn't increase spamming (e.g., just store hashes, not the email addresses themselves). Make the spammers pay for the opt-out list upkeep. Most importantly, it has to be supported by law, not by lame "self-regulation".
http://www.dwheeler.com/guarded-email: A paper about Guarded Email, a particular challenge-response approach. Unlike heuristic approaches, these approaches kill off the attack / counter-attack cycle we're stuck in.
Enjoy!
-
For countering spam, see Guarded Email
If you don't like spam, take a look at my guarded email protocol: http://www.dwheeler.com/guarded-email.
-
For a different approach, see Guarded EmailFor a different approach to countering spammers, see "Guarded Email" at: http://www.dwheeler.com/guarded-email
The Habeus approach is interesting, but since they've patented it, they could easily make it the only game in town. In particular, I concerned that they might be able to tax any email sent/received! I'd prefer to see methods where there is no centralized authority. Decentralization removes the danger of a single point of failure (and the taxes that often come from one).
-
More importantly: Train the programmers!It's certainly true that inappropriate tools sometimes contribute to security problems. But the more serious issue is that too many programmers don't know how to write secure code.
This problem is so serious that I give away a book explaining how to write secure programs in Linux and Unix. See my Secure Programming for Linux and Unix HOWTO.
It's certainly true that avoiding C/C++ eliminates some buffer overflow attacks, but note that there are things to watch out for in every language. I agree that there's an overuse of C/C++ in cases where they don't make sense, but switching to another language while failing to get the programmers trained won't solve the problem.
-
Another way?
Use source analyzers to find common mistakes, here are a few
Flawfinder
RATS
ITS4
Splint
also look at Splint's Links page for more on the topic -
Secure Programming HOWTO
-
Secure Programming for Linux and Unix HOWTOThere's a free book (and slides) already available if you want to learn how to write secure programs for Linux and Unix, it's the Secure Programming for Linux and Unix HOWTO. Take it, read it, use it. It's already included in many Linux distribution's documentation.
It is a good idea to get colleges to teach about writing secure programs. Currently, almost all programmers get out to the real world without knowing how to write secure programs, and they're writing the programs exposed to the entire Internet. That needs to change.
-
Open source software should be considered.It just makes sense that, if you're a government organization acquiring software, you should consider your open source software options. One problem with some government organizations is that they write request for proposals (RFPs), send them out, and presume that the only solutions available are those from the respondants. Since open source software / Free Software (OSS/FS) projects generally don't reply to RFPs, they're likely to be missed, even if they're perfect for the job. Hopefully, this law will at least make some people go to the web and examine their OSS/FS options.
For quantitative evidence showing that any software acquisition should consider their OSS/FS alternatives, see my paper Why Open Source Software / Free Software (OSS/FS)? Look at the Numbers!.
-
Secure Programming for Linux and Unix HOWTO
If you're interested in writing secure programs (instead of installing / configuring existing programs to be secure), take a look at my freely-available book: Secure Programming for Linux and Unix HOWTO.
-
Re:Price
just make it a link. tard.
-
For WRITING programs, see http://www.dwheeler.com
If you're writing programs that are supposed to be secure, take a peek at my freely-available book: Secure Programming for Linux and Unix HOWTO.
-
Re:Nice Start
Exactly what I think this is trying to finish.
Of course it confuses things by having been written before this list was published...time-travel can be so damn confusing sometimes :) -
Re:a little short??
It does look like a good start, add a few more chapters and you will be halfway there...
Sorry, but I think this is about all I have to say. Secure Programming HOWTO should take care of the rest.
-
Writing secure programs...
If you're interested writing secure applications for Linux/Unix systems, take a look at my free book, Secure Programming for Linux and Unix HOWTO, available at http://www.dwheeler.com/secure-programs.
-
There are already studies of Japanese Linux use.
There is a Japanese study, simply called the Linux white paper 2003, that studies current use of Linux in Japan. If you don't read Japanese, a summary of the material is available in Why OSS/FS? Look at the Numbers! in the market share section. Look for the point that starts with "A Japanese survey found widespread use and support for GNU/Linux; overall use of GNU/Linux jumped from 35.5% in 2001 to 64.3% in 2002 of Japanese corporations, and GNU/Linux was the most popular platform for small projects." Note that this is the percentage of corporations using it at all, not the number of total machines, but it certainly suggests interest by the Japanese corporate world. Various other statistics are quoted as well.
-
No, laws have yet to be seriously tried.I agree with the Slate article that spam is killing email. However, the article claims that laws and legislation aren't working, and this is nonsense. The problem isn't that the laws aren't working... it's that laws have not yet been seriously tried. In a few states, the partial anti-spam laws are actually having an effect. But until the majority of countries make spam illegal with fines (including as a U.S. federal law and an EU law), spam will continue to make email difficult to use.
If it was clearly illegal to send unsolited bulk email (spam) to anyone in the U.S. or Europe, and a hefty fine backed that up, it would force spammers to move to smaller countries. Those countries would then quickly get blacklisted: "Fix your laws, or you can't do business with us." There will still be spam, but it will be much, much rarer because it would be more dangerous. You could also fine companies that pay for spam - a few hefty payments would at least eliminate a lot of commercial spam.
A partial alternative would be to require (by law) automatable marking (say "ADV:" as the first characters in the subject line) and forbidding source forging. Again, could spammers disobey the law? Sure, murder still happens too. But by making it legally a crime, with real penalties, we certainly reduce the number of perpetrators.
For more info, see http://www.dwheeler.com/essays/stopspam.html
-
Linux is widespread in Japan - here are some statsThere's actually quite a bit of Linux use in Japan. A Japanese paper called the Linux white paper 2003 found that overall use of GNU/Linux jumped from 35.5% in 2001 to 64.3% in 2002 by Japanese corporations, and GNU/Linux was the most popular platform for small projects. It also found that 49.3% of IT solution vendors support Linux in Japan, as well as a number of other interesting statistics.
If you don't read Japanese, you can find a summary of interesting results in Why Open Source Software / Free Software (OSS/FS)? Look at the Numbers! ; look for the text starting with "A Japanese survey found".
-
Secure Programming for Linux and Unix HOWTO
If you want to learn how to write secure programs for Linux or Unix systems, read my freely-available book, Secure Programming for Linux and Unix HOWTO. You can get it from http://www.dwheeler.com/secure-programs.
-
Spam is theft, period.Hmpf. Spam is theft, period. See my article at http://www.dwheeler.com/essays/stopspam.html.
What this shows is that if laws were passed that made spam illegal, we'd get a lot less spam. You don't have to enforce every spam case to make an effect; just putting a few spammers in jail or fining them will change the behavior of many others. If not illegal (the best solution), at least require all spam to be marked so that it can be easily filtered. And include all spam, not just commercial spam.
-
Here are a number of studies - see my paper.There are a number of TCO studies in my paper, Why Open Source Software / Free Software (OSS/FS)? Look at the Numbers!, that you should look at.
The biggest issue is, in my mind, use common sense. Make sure you have a better understanding of your current situation (systems and how people use them). In most cases, don't make all the changes at once - plan to do things in stages, test things out before you depend on them, then deploy - and examine how that stage went so you can adjust your plan for the next stage. Maybe you start by replacing a few servers, for example. If you're replacing desktops, maybe you start with just a few systems, or you replace Microsoft Office while keeping Microsoft Windows on a few systems.There's much to be said for incremental changes.
-
Here are some links
-
Spam is still theftThis is an absurdly one-sided piece, that seems to try to paint one spammer in the absolutely most positive light possible. I'm sure that there are many bank robbers and drug lords that use their money to support their families. The problem is not that they have families. The problem is that spammers are intentionally stealing resources from other people. See my essay at http://www.dwheeler.com/essays/stopspam.html.
Fundamentally, her process is to make other people pay for her business. That is unacceptable.
The notion that people can "opt-out" is absurd; trying to opt-out of many lists will add you to the "sucker" list, and there's no way for a recipient to know if they'll be opted out or in fact added more.
-
Secure Programming for Linux and Unix HOWTO
If you're trying to write secure applications, I suggest taking a look at my book Secure Programming for Linux and Unix HOWTO at http://www.dwheeler.com/secure-programs - it's free, just download and print. I just released the 29 October 2002 (version 3.000) edition.
-
Qmail not really open source software.
Qmail is not really an open source software/ free software program. See my paper at http://www.dwheeler.com/oss_fs_why.html for an explanation.
-
Hrmph. Spam is still theft. Still, might help.Hrmph. Spam (unsolicited bulk email) is still theft, and the DMA is going to do all it can to ensure that the theft can continue (as long the thieves are THEIR members).
Still, this might help in spite of them. A U.S.-wide law against forged "from" messages from commercial spam would at least dissuade some, especially if it had a stiff penalty. This would make it easier to set up my mailbox so that I raise the priority for people I've talked with before; with stiff penalties, they're less likely to forge friends' addresses.
This would be REALLY good if the federal law also required the "ADV" convention, and nailed down EXACTLY what it means. It's already in some state laws. If I could automatically reject the messages without having to read them all, that would steal my bandwidth and storage, but at least it wouldn't steal my time.
Yeah, not everyone obeys the law, there are offsite systems, etc. But it would be a first step, and some legal tools would make it a lot easier to employ technical ones. For example, there's no point in tracking down offenders if they've broken no law. Also, the evasion techniques make it much clearer that they ARE breaking the law. Finally, if nearly all email from some asian countries are spam, then entire continents can blacklist them... and that would be a real wake-up call that would reduce spam. So, a few basic laws can really enable technological solutions, so even a feeble law might help.
I've written down a few comments and anti-spam techniques at http://www.dwheeler.com/esssays/stopspam.html; some of you may find them interesting. I know many others are interested in stemming this outrageous flood of spam that is threatening to steal the ability to receive email.
-
Re:You answer your own question...
You are working backwards. Looking at it objectively, it is clear that a system where applications have to become superuser to perform certain tasks but can relinquish this authority is inferior to a system where superuser priveleges are never given to a process.
Your position is understandable since this is how the Unix security model has worked for decades even though better mechanisms have been proposed but rarely caught on. For example, look at POSIX 1.E which is almost 2 decades for an example of a better model for handling systems permissions than the traditional Unix model. Recently there has been work done on FreeBSD POSIX 1.E capabilities as well as on the Linux front. This is a good indication that more and more people are disatisfied with the deficiencies of the Unix security model and its reliance on a "superuser" account for so many essential tasks.
Lastly I don't think any Linux distro has ever been certified as POSIX compliant although many feel that doing so wouldn't be difficult. -
Re:Centralising security
Even if centralized security is stipulated to be a good thing, Passport makes no sense. Passport is a lot like Kerberos, in that you have password servers, and services which ask for tickets which you got from the password servers; and the whole thing runs out in the open, where anyone in the world with packet sniffing software can intercept the whole exchange. The problem with this is that Kerberos only makes sense when those who control the services also control the password servers. Kerberos at MIT makes sense, because the printers and file servers and wiring and power are all provided by MIT, so they definitely should control the password servers. On the Web, however, MS wants to turn this relationship upside-down: by virtue of controlling the password servers, they want to gain control the Web. (Imagine if MS could cut off any Web-based business if they didn't toe the MS line).
There are other models of centralized or distributed security that make more sense for the Web. See David Wheeler's essay on email security based on secure DNS. It seems to me these ideas could be extended to provide authentication on the Web without extending more power to those who have no standing to wield it (I'm talking about MS) -
Design, Development, Deployment "load marks"From the Plimsoll Club history
Samuel Plimsoll, M.P.
(1824-1898)
Samuel Plimsoll brought about one of the greatest shipping revolutions ever known by shocking the British nation into making reforms which have saved the lives of countless seamen. By the mid-1800's, the overloading of English ships had become a national problem. Plimsoll took up as a crusade the plan of James Hall to require that vessels bear a load line marking indicating when they were overloaded, hence ensuring the safety of crew and cargo. His violent speeches aroused the House of Commons; his book, Our Seamen, shocked the people at large into clamorous indignation. His book also earned him the hatred of many shipowners who set in train a series of legal battles against Plimsoll. Through this adversity and personal loss, Plimsoll clung doggedly to his facts. He fought to the point of utter exhaustion until finally, in 1876, Parliament was forced to pass the Unseaworthy Ships Bill into law, requiring that vessels bear the load line freeboard marking. It was soon known as the "Plimsoll Mark" and was eventually adopted by all maritime nations of the world.The risks,issues and solutions for providing a more secure operating and application enviroment have been known for decades. Those who do not already comprehend the issues and are willing to learn, should take some time out to listen to some of the speeches at Dr. Dobbs Journal's Technetcast security archives, starting with Meeting Future Security Challenges by Dr. Blaine Burnam, Director, Georgia Tech Information Security Center (GTISC) and previously with the National Security Agency (NSA)
The "security rules" for Unix based system and application development are well known, although not widely taught. See Secure Programming for Linux and Unix by David Wheeler. Although Microsoft's NT,2000 and XP are not Unix based, a lot of the core above "rules" apply or have direct or indirect equivalents
Because some developers ignore similar above rules, the design and implementation of some applications and servers are just too unsafe to use in the "open ocean" of the internet.
Numerous security experts have railed against Microsoft's lack of security, best summed up by Bruce Schneier Founder and CTO Counterpane Internet Security, Inc who rightly stated ...Honestly, security experts don't pick on Microsoft because we have some fundamental dislike for the company. Indeed, Microsoft's poor products are one of the reasons we're in business. We pick on them because they've done more to harm Internet security than anyone else, because they repeatedly lie to the public about their products' security, and because they do everything they can to convince people that the problems lie anywhere but inside Microsoft. Microsoft treats security vulnerabilities as public relations problems. Until that changes, expect more of this kind of nonsense from Microsoft and its products. (Note to Gartner: The vulnerabilities will come, a couple of them a week, for years and years...until people stop looking for them. Waiting six months isn't going to make this OS safer.)
However Microsoft's products are not alone in the presence of vulnerabilities, this is a major issue for Linux/BSD and Unix as well as any other OS and vendor.
In a recent speech Fixing Network Security by Hacking the Business Climate Bruce Schneier claimed that for change to occur, the software industry must become libel for damages from "unsecure" software, however historically, this has not always been the case, since most businesses can insure against damages and pass the cost along to the consumer.
The Ford Pinto and more recently the Ford Explorer's tires are two examples of public and media pressure being more successful than just threat of lawsuits. Even so, eventually though public pressure the governments around the world have to step in and pass regulations that set up a minimum set of requirements an automobile has to meet to be deemed "road worthy". This includes crash testing as well as the inclusion of safety equipment on all models. The requirement are not constant and change to meet the expectations and demands of the public and lawmakers.
The onus is not only on the automotive industry itself but also on the users. Most countries require that all automobiles undergo regular inspection and maintain an up to date "Warrant of Fitness".
In the same way, if you want a secure IT infrastructure, eventually the software design, implementation and each deployment will have to undergo the same type of regulation and scrutiny.
For paid software distributions, this could mean just a tick list of security features and security tests to the other extreme of requiring the source code to be fully audited for government/secure deployments.
For users, this would require running a program that checks to make sure that all the required software security update/patches have been installed to the other extreme of requiring an audited deployment for government/secure deployments.
Users and vendors should be taking a more active approach, including lobbying government, to
1) set up a minimum set of expectations, in the design and implementation of internet "accessing" software ; and
2) ensure that all deployments are more securely implemented ; and/or
3) remove inherently unsecure products from the marketplace.IMO the above three are preferable to all software vendors, including Microsoft, than attempts to allow liability lawsuits against vendors for deployments which the software vendors have very little control over.
-
Re:Talking about Freedom First
While it's true that the GNU project took too long to finish their kernel, and it is arguable that they ought to have abandoned it in favor of Linux when it appeared, these tactical considerations aren't really relevant. Would you be pleased if, after completing 80% of a large free software project and getting bogged down on the remaining 20%, I added the missing pieces and released the product under a different name? What if people began to credit me with organizing the entire project?
Ummmm... GNU does NOT comprise 80% of a Linux distro. I refer you back to this article, Section 3. Adding up the 35 projects listed, GNU provides 26%.
But it's obvious to me that I'm not going to convice you, and so far nothing that you've said is any more convincing than anything I've heard or read before. Would you agree that we disagree? Maybe we ought to just leave it at that.
I'm more than happy to go away continuing to call Linux distro's "Linux" because that's what I think they are. I will feel no guilt for doing so. Moreover, I encourage you to continue calling them "GNU/Linux" if that's what you think they ought to be called.
Cheers!
-
Re:I believe they are wrong
fair enough, if you think the Linux component was more important than the GNU component, then I agree that is a good reason for calling it Linux..
btw, some lines of source code stats are here. looks like [glibc + gcc (\lessthan) kernel], indeed (still, i would argue that in practice other GNU components are used, and also that the invention of the GPL is perhaps the single most important innovation, and finally that the Linux kernel would not have been written if there were not already other pieces of a free OS to use it with). -
why I will call it GNU/Linux1. Moral obligation to give credit where credit is due (unless there is some other overriding factor)
2. The GNU Project apparently contributed more code to the system than the Linux kernel has (see http://www.dwheeler.com/sloc/redhat71-v1/redhat71
s loc.html).. 3. I am not put off by the FSF making an issue of this, because the FSF has every reason to harp on this issue; name recognition and publicity is critical for any organization. Can you imagine how much potential membership (and political voice) has been lost by the FSF already due to the lost publicity of the system being called Linux?
4. The inelegance of the sound of GNU/Linux is much less important than the obligation to try to give credit where credit is due. Since that credit is really, really important to the organization to which it is due, I think in this case the obligation trumps, even if it makes the GNU/Linux system less sexy/marketable.
5. But what about the other important components of most GNU/Linux systems? Like X, or Perl? Well, if we are going to abbreviate (and I think we must), the most important contributor is GNU (see #2). Personally, I think calling it "the GNU/Linux OS" makes more sense than calling it "the GNU OS", so let's take the top two most important components and stop there.
-
See also "Secure Programming for Linux and Unix"
For another book on writing secure programs, see my "Secure Programming for Linux and Unix HOWTO" at http://www.dwheeler.com/secure-programs. It's free, and it covers both web applications and non-web applications.
-
Re:What about everything else?
The FAQ merely says "GNU is the most important secondary component, so we should include it" and Perens advocates using it as well. My point was that the threshold shouldn't be there, so why bother using it?
Actually the FAQ says that "The principal developer is the GNU Project" implying that Linux is the secondary component. So, according to GNU/Stallman, you need to give the principle developer recognition. You're free to cut off any secondary developers at any point you choose. Call it GNU, or GNU/Linux, or GNU/Linux/perl, etc.
I disagree with this assessment. I think that GNU will still be a nice set of free utilities for Solaris if Linux didn't come along. If you ask me the principle project is on Linux distros and it's Linux.
Now, that's just an opinion. Maybe we should measure somehow. How about lines of code? Check out this. Top project is the kernel.
Well maybe it's overall contribution? The top 3 pieces of code, are not the GNU project. The first GNU project's contribution is only 15% of the contribution of the top three. 6 of the next 7 projects are GNU projects. Combined they still only account for 69% of the top three projects.
Ok. Well maybe it should be measured in terms of which code is more frequently resident in memory. Glibc runs a lot, that's for sure. But not as much as the kernel!
By what measure, other than "we were here first", can GNU make the claim that they're the principle developer?
I am a supporter of GNU and I agree with almost all of the things that they stand for... including the differentiation of open source from free software. But this silly demand is the stark exception. And it drives me crazy. I wish that they would have simplified the FAQ and put the real reason: "Because we want to ride on the PR coat tails of Linux".
-
At least it made Infoworld, including the MS FUDLead Windows developer bugged by security. Which includes the statements...
It is not only Microsoft that is to blame for the creation of faulty software, said Chandra Mugunda, a software consultant with Dell Computer in Round Rock, Texas, who attended Valentine's presentation here. "It's an industry-wide problem, it's not just a Microsoft problem," he said. "But they're the leaders, and they should take the lead to solve these problems"
Valentine, too, took the opportunity to point out the widespread bugs that have been discovered in competing operating products such as Linux and Unix.
"Every operating system out there is about equal in the number of vulnerabilities reported," he said. "We all suck."
However, the "Every operating system out there is about equal in the number of vulnerabilities reported" statement of Valentine's fails to take into consideration that in most cases Unix, open source and free licensed software has been designed from the outset with at least the issue of security in mind.. Whereas, some Microsoft systems such as their embedded scripting systems have not.The result is that is far easier to exploit an easy, scriptable vulnerability in a Microsoft system, that has no patch for months, than to exploit a difficult, binary hole in a LInux/BSD system that has a patch within days.
-
Authentication and spamThere is a potential problem with spam and authentication, but it's not what you think.
There are lots of ways to authenticate, but they tend to not be very automatic and require too much work by users. An alternative approach is described in: http://www.dwheeler.com/essays/easy-email-sec.htm
l Here's the quote: "Sadly, you probably don't want to automatically authenticate every message. That's because spammers would set up bogus servers waiting for your program to authenticate the message (using a used-only-once sending email address), and add you to a ``valid email address'' list if you tried to authenticate it (and once on, you'll never come off the list no matter what they say)."
-
If you're recommending Linux, use real data.
If you're recommending the use of [GNU/]Linux to decision-makers, then you should use real data as part of your rationale. Take a look at my (long!) article, Why Open Source Software / Free Software (OSS/FS)? Look at the Numbers! , which has a collection of useful facts and figures (including market share, reliability studies, etc.).
-
I meant arch, not subversion.I'm sorry, you're right. I meant arch, not subversion in my previous post.
I took a short look at the arch home page. Support for sftp still isn't built-in (so it's still dreadfully insecure), but there is a separate patch available to support sftp. I suspect that eventually arch will support sftp and/or other things to replace the absurd "password in the clear" approach it's currently using.
However, it still isn't clear to me that a developer can't modify other files in supposedly "frozen" copies in arch. arch takes a very unusual approach to repositories - which is interesting! - but it appears that Joe Programmer might be able to modify other files than the one Joe is supposed to be modifying. Or perhaps Joe changes "older" versions that he's submitted, screwing up configuration management. Perhaps arch counters this, but I haven't seen any analysis of it that way (please let me know if there are any!!). The fact that arch still requires passwords in the clear doesn't exactly give me warm fuzzies about its security; writing secure programs isn't trivial and requires a commitment to do so.
Arch is actually quite interesting in many ways... but the security issues do concern me.
-
1400? Try 3100!
Take a look at slide 19 - 1400 devs, but 1700 testers. Do you suppose that means that Win2k had 3100 people working full-time on it? Lowballing the numbers (55k per dev, 45k per tester):
1,400 * 55,000 = 77,000,000
1,700 * 45,000 = 76,500,000
153,500,000 a year * 3 years (from slide 3) = 460,000,000
Include an overhead multiplier
460,000,000 * 2.4 = 1,105,200,000
And we wind up with a rough US$1.1BB.
This suggests that win2k represents 20 million SLOC, Just slightly higher than RH 6.2, at 17 and change.
His cost estimates place RH 6.2 at US$614,421,924.71
I suspect MS probably pays more per dev, but I have no proof, so I'll stick with the industry averages. Also, testers may have been shared across projects, MS can pool resources and bring overhead lower, etc...
I'm not drawing any conclusions, just compiling data... -
Software Developers, See HOWTO!
If you're writing software for Linux/Unix systems, go see my book, the Secure Programming for Linux and Unix HOWTO available at http://www.dwheeler.com/secure-programs. It's freely available and redistributable (GFDL license), and it's got lots of information on how to write secure programs. There's lots of information on the Internet on how to write secure programs, but this book gives a lot of information in one place. Enjoy!
-
Re:His Paper Is Bunk. You're Right.In his paper, he uses the basic COCOMO model for estimating the cost. This model, quite frankly, sucks. Boehm's book even states, more or less, that the COCOMO model is only accurate to a factor of 10.
I have the COCOMO II book, and I have used the COCOMO model for certain projects. I agree that it is not appropriate here. COCOMO was designed with a narrow focus in mind, and applied best to repeatable projects in a structured work environment. It requires you to estimate parameters for factors such as "Programmer Unfamiliarity", "Precedentedness" "Development Flexibility", "Team Cohesion", "Process Maturity", "Multisite Development", etc. Each of these fudge-factors makes it extremely difficult to correctly apply the model to someone else's work.
Also, each of these factors is likely to be different for each major component.
"I was unable to find a publicly-backed average value for overhead, also called the 'wrap rate.' This value is necessary to estimate the costs of office space, equipment, overhead staff, and so on. I talked to two cost analysts, who suggested that 2.4 would be a reasonable overhead (wrap) rate."(from here)
He is using an average overhead rate for a large corporation. He forgot to take in to account the fact that Open-Source developers (generally) don't get office space or health insurance or secretaries. They use their own equipment in their own homes. So a more reasonable overhead rate for this project would be close to 0.1.
So taking all of this in to account, he's probably off by a factor of more than 100. (If you want to know how accurate he was, compare his estimate to the actual cost of developing a Linux distro...
;) While it might have made interesting headlines, I see little value in the actual number. -
Responses from the author!!Since I'm the author of this paper (More than a Gigabuck: Estimating GNU/Linux's Size), I suppose I should respond to some of the comments made here:
- How did I arrive at the estimate of $1 billion? The short answer is "see the paper". I wrote a tool to compute the number of physical source lines of code (SLOC), used Boehm's well-repected COCOMO model to determine the effort (in person-years) from the SLOC, and then converted that effort into an estimated development cost using programmer salary averages and wrap rates. See the paper for the details.
- It's true that there's no necessary relationship between cost and value. I don't see how that contradicts the paper; the paper never claims that there is one. Clearly, you can spend $1 million to develop a program that is worthless; it happens all too often. Proprietary vendors make money by making more money from sales than it cost to develop the software, so proprietary software vendors are very aware of the difference betwen value and cost. Look carefully at the phrasing. All the paper says is that "Had this Linux distribution been developed by conventional proprietary means, it would have cost over $1.08 billion (1,000 million) to develop in the U.S. (in year 2000 dollars)." The paper does not claim that Red Hat actually spent $1 billion, or that their distributions' sale value is related to this development cost figure. Indeed, what the paper shows is that by using OSS/FS approaches, it's possible to build large systems that would cost over $1 billion to develop using conventional proprietary means.
- Several have complained about the use of COCOMO for estimating effort from lines of code. COCOMO is certainly not perfect, but it's a well-tested, widely accepted, and widely used model. It's also very clearly documented, so there are no "hidden assumptions". In particular, the model and constants used in COCOMO are based on a wide variety of real projects. It's rediculous to believe that its results are accurate to the nearest hour; as noted throughout the paper, this is only an estimate. A few people have noted that their software took less time to develop, but there are many factors at work. One is that highly experienced people can develop code more quickly; however, not everyone is equally skilled, so with large systems and many developers this effect should even out. Another is that COCOMO includes design time, documentation time, and testing time. Also, this includes not only an average U.S. programmers' salary, but also the wrap rate for overhead (building costs, insurance, and so on) - which programmers don't see in their paychecks, but are certainly paid for by traditional businesses. Don't like COCOMO? That's fine - use your own model, preferably one that's been widely tested in the industry. This paper shows you exactly how to do this sort of analysis.
- I do not claim that every line of code is a "complete rebuild". I'm simply trying to estimate how much it would be take to build the system if it was rebuilt.
- The problems with physical SLOC's sensitivity to formatting is well-documented, and I note that in the paper. It's not as bad as you'd think when analyzing larger systems, due to averaging. But if you would rather use logical SLOC, feel free to write code to do that and contribute it to sloccount. In short, instead of complaining, contribute.
- As documented in the paper, I only used Basic COCOMO. I don't have enough information about each project to really use the more detailed COCOMO models effectively. However, the paper has all you need if you want to do more detailed analysis using other effort and cost estimation models, including the versions of COCOMO that require more input (e.g., Intermediate COCOMO).
- SLOC isn't a very good measure of productivity, but it's generally a very good way to estimate effort. This distinction is important. If programmer A can do something in 100 SLOC, and programmer B needs 10,000 SLOC to do the same thing, it's crazy to think that programmer B is more productive. But it is reasonable to believe that it will take more effort for programmer B to do the same thing (and thus more money). It's possible to game this (e.g., creating separate print commands for each letter to be output as a string), but the resulting code is pretty ugly and programmers generally only intentionally game things if they believe having higher SLOC values will improve their salaries (an unlikely claim for the software in the Red Hat Linux distribution). The paper only measures effort to develop Red Hat Linux 7.1. You'll have to determine if that's a comparable level of functionality to other systems.
- This doesn't count "the operating system". It counts "Red Hat Linux 7.1". Thus, it includes the word processors, spread sheets, and so on. It's not as easy to determine what to leave out; you could compute just the minimal "base", but few people would want to use such a system. Again, I think that's extremely clearly stated in the paper.
- Others have been inspired by my paper to
do an analysis of the Debian GNU/Linux distribution, using my tool
sloccount.
You can see their very interesting paper
Counting Potatoes: The size of Debian 2.2 at
http://people.debian.org
/~jgb/debian-counting. They found that Debian 2.2 includes more than 55 million physical SLOC, and would have cost nearly $1.9 billion USD using over 14,000 person-years to develop using traditional proprietary techniques. - Yeah, I need a better picture. I just haven't gotten around to it.
-
Re:OSS...
Here, you might find yourself getting annoyed with this. Enjoy!
-
slashdotted!
This paper analyzes the amount of source code in GNU/Linux, using Red Hat Linux 7.1 as a representative GNU/Linux distribution, and presents what I believe are interesting results.
In particular, it would cost over $1 billion ($1,000 million - a Gigabuck) to develop this GNU/Linux distribution by conventional proprietary means in the U.S. (in year 2000 U.S. dollars). Compare this to the $600 million estimate for Red Hat Linux version 6.2 (which had been released about one year earlier). Also, Red Hat Linux 7.1 includes over 30 million physical source lines of code (SLOC), compared to well over 17 million SLOC in version 6.2. Using the COCOMO cost model, this system is estimated to have required about 8,000 person-years of development time (as compared to 4,500 person-years to develop version 6.2). Thus, Red Hat Linux 7.1 represents over a 60% increase in size, effort, and traditional development costs over Red Hat Linux 6.2. This is due to an increased number of mature and maturing open source / free software programs available worldwide.
Many other interesting statistics emerge. The largest components (in order) were the Linux kernel (including device drivers), Mozilla (Netscape's open source web system including a web browser, email client, and HTML editor), the X Window system (the infrastructure for the graphical user interface), gcc (a compilation system), gdb (for debugging), basic binary tools, emacs (a text editor and far more), LAPACK (a large Fortran library for numerical linear algebra), the Gimp (a bitmapped graphics editor), and MySQL (a relational database system). The languages used, sorted by the most lines of code, were C (71% - was 81%), C++ (15% - was 8%), shell (including ksh), Lisp, assembly, Perl, Fortran, Python, tcl, Java, yacc/bison, expect, lex/flex, awk, Objective-C, Ada, C shell, Pascal, and sed.
The predominant software license is the GNU GPL. Slightly over half of the software is simply licensed using the GPL, and the software packages using the copylefting licenses (the GPL and LGPL), at least in part or as an alternative, accounted for 63% of the code. In all ways, the copylefting licenses (GPL and LGPL) are the dominant licenses in this GNU/Linux distribution. In contrast, only 0.2% of the software is public domain.
This paper is an update of my previous paper on estimating GNU/Linux's size, which measured Red Hat Linux 6.2 [Wheeler 2001]. Since Red Hat Linux 6.2 was released in March 2000, and Red Hat Linux 7.1 was released in April 2001, this paper shows what's changed over approximately one year. More information is available at http://www.dwheeler.com/sloc. 1. Introduction The GNU/Linux operating system has gone from an unknown to a powerful market force. Netcraft found that, of the systems running web servers on June 2001, GNU/Linux was now the second most popular operating system (with 29.6%, versus Windows' 49.6%) [Netcraft 2001]. Another survey, of primarily European and educational sites, found that GNU/Linux was used more than any other operating system (of the sites it surveyed) [Zoebelein 1999]. IDC found that 25% of all server operating systems purchased in 1999 were GNU/Linux, making it second only to Windows NT's 38% [Shankland 2000a].
There appear to be many reasons for this, and not simply because GNU/Linux can be obtained at no or low cost. For example, experiments suggest that GNU/Linux is highly reliable. A 1995 study of a set of individual components found that the GNU and GNU/Linux components had a significantly higher reliability than their proprietary Unix competitors (6% to 9% failure rate with GNU and Linux, versus an average 23% failure rate with the proprietary software using their measurement technique) [Miller 1995]. A ten-month experiment in 1999 by ZDnet found that, while Microsoft's Windows NT crashed every six weeks under a ``typical'' intranet load, using the same load and request set the GNU/Linux systems (from two different distributors) never crashed [Vaughan-Nichols 1999].
However, possibly the most important reason for GNU/Linux's popularity among many developers and users is that its source code is generally ``open source software'' and/or ``free software''. A program that is ``open source software'' or ``free software'' is essentially a program whose source code can be obtained, viewed, changed, and redistributed without royalties or other limitations of these actions. A more formal definition of ``open source software'' is available from the Open Source Initiative [OSI 1999], a more formal definition of ``free software'' (as the term is used in this paper) is available from the Free Software Foundation [FSF 2000], and other general information about these topics is available at Wheeler [2000a]. Quantitative rationales for using open source / free software is given in Wheeler [2000b]. The GNU/Linux operating system is actually a suite of components, including the Linux kernel on which it is based, and it is packaged, sold, and supported by a variety of distributors. The Linux kernel is ``open source software''/``free software'', and this is also true for all (or nearly all) other components of a typical GNU/Linux distribution. Open source software/free software frees users from being captives of a particular vendor, since it permits users to fix any problems immediately, tailor their system, and analyze their software in arbitrary ways.
Surprisingly, although anyone can analyze GNU/Linux for arbitrary properties, I have found little published analysis of the amount of source lines of code (SLOC) contained in a GNU/Linux distribution. Microsoft unintentionally published some analysis data in the documents usually called ``Halloween I'' and ``Halloween II'' [Halloween I] [Halloween II]. Another study focused on the Linux kernel and its growth over time is by Godfrey [2000]; this is an interesting study but it focuses solely on the Linux kernel (not the entire operating system). Paul G. Allen posted some results from running Scientific Toolworks, Inc.'s tools on the Linux kernel, but this analysis only considered C code (including headers) - ignoring the many other languages used in constructing the Linux kernel (e.g., assembly language), and only concentrating on the kernel. The Free Code Graphing Project at http://fcgp.sourceforge.net generates a graphical representation of a program (currently, the Linux kernel), but only of the C code. In a previous paper, I examined Red Hat Linux 6.2 and the numbers from the Halloween papers [Wheeler 2001].
This paper updates my previous paper, showing estimates of the size of one of today's GNU/Linux distributions, and it estimates how much it would cost to rebuild this typical GNU/Linux distribution using traditional software development techniques. Various definitions and assumptions are included, so that others can understand exactly what these numbers mean. I have intentionally written this paper so that you do not need to read the previous version of this paper first.
For my purposes, I have selected as my ``representative'' GNU/Linux distribution Red Hat Linux version 7.1. I believe this distribution is reasonably representative for several reasons:
- Red Hat Linux is the most popular Linux distribution sold in 1999 according to IDC [Shankland 2000b]. Red Hat sold 48% of all copies in 1999; the next largest distribution in market share sales was SuSE (a German distributor) at 15%. Not all GNU/Linux copies are ``sold'' in a way that this study would count, but the study at least shows that Red Hat's distribution is a popular one.
- Many distributions (such as Mandrake) are based on, or were originally developed from, a version of Red Hat Linux. This doesn't mean the other distributions are less capable, but it suggests that these other distributions are likely to have a similar set of components.
- All major general-purpose distributions support (at least) the kind of functionality supported by Red Hat Linux, if for no other reason than to compete with Red Hat.
- All distributors start with the same set of open source software projects from which to choose components to integrate. Therefore, other distributions are likely to choose the same components or similar kinds of components with often similar size for the same kind of functionality.
Different distributions and versions would produce different size figures, but I hope that this paper will be enlightening even though it doesn't try to evaluate ``all'' distributions. Note that some distributions (such as SuSE) may decide to add many more applications, but also note this would only create larger (not smaller) sizes and estimated levels of effort. At the time that I began this project, version 7.1 was the latest version of Red Hat Linux available, so I selected that version for analysis.
Note that Red Hat Linux 6.2 was released on March 2000, Red Hat Linux 7 was released on September 2000 (I have not counted its code), and Red Hat Linux 7.1 was released on April 2001. Thus, the differences between Red Hat Linux 7.1 and 6.2 show differences accrued over 13 months (approximately one year).
Clearly there is far more open source / free software available worldwide than is counted in this paper. However, the job of a distributor is to examine these various options and select software that they believe is both sufficiently mature and useful to their target market. Thus, examining a particular distribution results in a selective analysis of such software.
Section 2 briefly describes the approach used to estimate the ``size'' of this distribution (more details are in Appendix A). Section 3 discusses some of the results. Section 4 presents conclusions, followed by an appendix. GNU/Linux is often called simply ``Linux'', but technically Linux is only the name of the operating system kernel; to eliminate ambiguity this paper uses the term ``GNU/Linux'' as the general name for the whole system and ``Linux kernel'' for just this inner kernel. 2. Approach My basic approach was to:
- install the source code files in uncompressed format; this requires carefully selecting the source code to be analyzed.
- count the number of source lines of code (SLOC); this requires a careful definition of SLOC.
- use an estimation model to estimate the effort and cost of developing the same system in a proprietary manner; this requires an estimation model.
- determine the software licenses of each component and develop statistics based on these categories.
More detail on this approach is described in Appendix A. A few summary points are worth mentioning here, however. 2.1 Selecting Source Code
I included all software provided in the Red Hat distribution, but note that Red Hat no longer includes software packages that only apply to other CPU architectures (and thus packages not applying to the x86 family were excluded). I did not include ``old'' versions of software, or ``beta'' software where non-beta was available. I did include ``beta'' software where there was no alternative, because some developers don't remove the ``beta'' label even when it's widely used and perceived to be reliable.
I used md5 checksums to identify and ignore duplicate files, so if the same file contents appeared in more than one file, it was only counted once (as a tie-breaker, such files are assigned to the first build package it applies to in alphabetic order).
The code in makefiles and Red Hat Package Manager (RPM) specifications was not included. Various heuristics were used to detect automatically generated code, and any such code was also excluded from the count. A number of other heuristics were used to determine if a language was a source program file, and if so, what its language was.
Since different languages have different syntaxes, I could only measure the SLOC for the languages that my tool (sloccount) could detect and handle. The languages sloccount could detect and handle are Ada, Assembly, awk, Bourne shell and variants, C, C++, C shell, Expect, Fortran, Java, lex/flex, LISP/Scheme, Makefile, Objective-C, Pascal, Perl, Python, sed, SQL, TCL, and Yacc/bison. Other languages are not counted; these include XUL (used in Mozilla), Javascript (also in Mozilla), PHP, and Objective Caml (an OO dialect of ML). Also code embedded in data is not counted (e.g., code embedded in HTML files). Some systems use their own built-in languages; in general code in these languages is not counted.