As long as Teoma charges for submitting a URL, Teoma will UTTERLY FAIL. There are many personal websites, and their maintainers (like me!) don't have the money to submit paid URLs to search engines.
Without those URLs,
Teoma will never consider much of the most useful information, so
Teoma results will always be poorer.
Why would anyone switch to using
a search engine whose
business model is designed to give poorer results?
People currently submit their URLs to Google because (1) it's the #1 search engine, and (2) it's free. Now Teoma wants to compete with Google (which has a large database) through a business model which will ensure that Teoma always has an uncompetitive database? Rediculous.
I think Google has the better business model. Charge for advertizing on keywords, and show the
ads separately.
That way, people don't feel like they're being
lied to, and people get the best possible
results (without it interfering with the
search engine's business model).
I'd like to see Open Office added to the Red Hat distribution; I don't see it noted in the announcement.
Yes, I know that Red Hat isn't _primarily_ marketing to desktops, but even system administrators and and others need to read and edit Word, PowerPoint, and Excel files. For example, there are FAR too many documents (including technical material) that are only available in those formats. KWord is quite ineffective at importing Word, and Abiword can only handle very simple Word documents. Gnumeric does a good job with Excel spreadsheets, but I know of no other open source program that can handle powerpoint files. If you don't want it to use up space on your hard drive, don't install Open Office, but for many it would be a BIG help to have Open Office ready-to-install on the CD's.
This article starts from a misunderstanding, and then "discovers" that the misunderstanding isn't true. Yes, source lines of code (SLOC) aren't good measures of productivity; that's because they weren't intended to measure productivity.
SLOC are useful for estimating development effort.
The best programmers manage to simplify problems so that they can solve the same problem with less effort.
SLOC can then be used to estimate that effort,
before it's expended, or used to estimate
the effort that was expended.
Claiming that SLOC measures productivity is silly.
There's a whole literature on managing software projects. Look up terms like "Software Engineering" and "Software Management".
For tracking progress, the usual approach is to divide the project into a series of steps, where each step can be unambiguously determined to be true or not (no "90% done" steps). Estimate the time that's required for each step and use a scheduling program to determine how long it will take; you'll also need separate management reserve time for the inevitable problems (but keep this separate from the steps, so that you'll know when you're using it up). Some people define dollar values for each step, resulting in earned value approaches.
YES. It's definitely possible to use tools
to search through code and find problem areas.
Such tools are called "source code scanners."
There are at least two open source software/
free software source code scanners that
work like this. My tool
Flawfinder
does this, as does John Viega's
RATS
tool. Both tools are licensed under the GPL.
They both work essentially the same way;
they use patterns and some heuristics to
identify "dangerous" function calls and patterns,
and also try to rank their riskiness.
They both have built-in databases
(so you don't have to figure out what should
be looked for), and they both parse the code
sufficiently so that comments and data in
strings are ignored.
They also examine the parameter values to
determine the riskiness of the construct.
Both were influenced, by the way, by a previous
tool called ITS4.
Eventually we hope to merge our efforts, but it
hasn't been immediately obvious how to do so.
In fact, it can be argued that we shouldn't: having two tools is like having two different people look at something, each catches or emphasizes something the other doesn't.
I think running either
tool on the entire distribution would result
in too much output to be
worthwhile. These tools simply identify
potentially dangerous code - you still have to
look at the code to determine if it's really a problem.
My hope, instead, is to convince the various
developers of each package to use such tools
to find potential problems before the
code is released to the public.
Don't let me discourage you from trying -
please do review what you can!!
But I'd like to see everyone reviewing
code they work with, not just a few
code reviewers.
I'm a big believer in defense-in-depth
strategies.
You should use source code scanning
tools like these to find problems
in your code before you run it.
You should then run tools like Purify and
Electric Fence to find other problems.
Then, use tools and mechanisms that counter
security attacks at run-time, e.g.,
StackGuard, TempGuard, and so on.
It would be great if there were a global
setting so that you could make ALL
programs use the "slow but safe free()"
without having to recompile the C library.
You might find my
Secure Programming for Linux and Unix HOWTO
useful.
It's a set of guidelines for writing secure
programs, including writing web applications,
clients, viewers (including word processors),
setuid/setgid programs, and so on.
It's focused on Linux and Unix, but most
of the general principles apply to all systems.
Various posts have wondered if there are
TCO figures, or market share numbers,
or claimed that Microsoft "owns" all the
markets it competes in, or commented on the $1.9
billion figure in Perens' article.
For example,
Microsoft absolutely owns the desktop
client market, that's true.
But it certainly doesn't own other markets - Apache is still the most common web browser,
for example, and sendmail is the most
popular mail transfer agent (MTA).
See my paper for the details.
Total cost of ownership (TCO) is so dependent
on the assumptions that you really have to do
your own.
However, it's clear that many people do
find that GNU/Linux systems have a lower
TCO than Microsoft's systems in their environment.
Please note that Perens himself claims that
the $1.9 billion estimate was only if
the software had been developed the same
way as Microsoft's. Perens does not
claim that $1.9 billion was spent.
Check the linked-to paper, I think it spells
things out clearly.
One caveat: I wrote the analysis tool used
in the paper.
However, the tool simply implements a well-known
and widely respected estimation model
that has been openly documented;
it's certainly not biased to give
open source software bigger results.
I think Perens' article was well-written.
Content providers: generate Plucker format.
on
Web Access on Handhelds
·
· Score: 2, Informative
Plucker
is a very good solution to the problem.
If you're a content provider and want to
support Palms, just
generate the Plucker format yourself.
That way,
users don't have to figure out how to generate
the format; they just download and synchronize.
Downloading the tools and then generating
the Plucker format is easy if you can use
a command line interface.
Plucker's format is essentially compressed
HTML, so for most websites it's easy to support.
Plucker is GPL'ed, so its components
(the generator and reader) can't
be "taken away"... and they are free for any use.
This combination of free reader, free creator,
and no risk (because it can't be taken away)
makes Plucker much more appropriate for many
content providers.
The Plucker viewer itself is quite capable,
for example, it supports
larger fonts for headings, bold text,
italics, hypertext links, images, horizontal rules,
and tables (formatted as one cell per line).
If you click on a hypertext link to a page
not included in the file,
Plucker will show you the
URL so you can look it up later.
Installing just the viewer is actually
quite easy for end-users; you can download
just the viewer from the Plucker website,
and Plucker users can beam the program to
other users of Palm-compatible PDAs.
Generating Plucker files is pretty easy from the
command line, but I do agree that currently
grandma may have trouble generating
documents on her own.
It's also true that
getting "new" versions of Plucker
documents isn't automatic; you have to do
something to get an update.
The Plucker folks are actively working on
solving these problems, e.g., creating
GUI interfaces. Since Plucker is already
a really nice viewer, and other work is
already ongoing, I think that the Plucker
developers will quickly succeed in making
it easier for naive users to generate their
own documents.
GPG is available, and the Germans are improving it
on
How to Save PGP
·
· Score: 5, Informative
So, PGP is may not be available in the future.
This is no big deal, really, since
GPG is
already available and can be used as a
replacement.
It's true that currently GPG's user interface
is terrible for beginning users if they have to use
it directly. So, clearly, you want to use
programs that embed GPG (like Evolution).
Also, note that the
German government is funding further development of GPG. They specifically say that their
funding will be used to make GPG more usable
by less experienced users, including
porting the software to other operating systems, developing graphical user interfaces (GUI) and writing a handbook.
Thus, this sounds like a short-term problem at
worst.
I see lots of opinions, but I'd like to see more
than that. Has anyone does a real
survey of colleges and universities to determine
what the "dominant" operating system is for
Computer Science departments?
A real survey would use standard statistical
methods, for example, identifying all the
universities and then creating a random sample
to evaluate
(because self-selected samples are notoriously
biased).
I haven't seen anything like that, but I sure
would like to.
Um, take another look at your map.
Virginia Tech is 4-5 HOURS by car from
Washington, DC.
Washington, DC would be a perfectly good
place to go for a summer internship, but
it'd be one heckuva commute.
You may be thinking of another university.
That being said, it is useful to have
a background in Calculus, multiple programming
languages, etc. Learning these things helps
you more quickly absorb other things later,
and being a quick study is really important.
I program sometimes, and I do use Calculus for
some of my work.
And yes, I think that unpaid experience with
open source projects will help someone gain
a job in developing software.
I would certainly consider it as evidence of
someone who was willing to go an extra step,
and I could even look at their contributions
to consider how well that person created
code, interacted with others, and so on.
But there are many factors, in particular,
it'd be better if the open source project
was related to the work that the person was
applying for. And yes, there are
open source software jobs!
Thanks for the plug!
My book, Secure Programming for Linux and
Unix HOWTO, is free, and it's
open source/free software (GNU FDL).
I've also just posted my presentation on
how to write secure programs; it's the presentation I gave at FOSDEM 2002 last week.
Note that these presentations have different
(overlapping) goals; Louis Bertrand's presentation is primarily
about OpenBSD (e.g., how it's developed),
while my presentation is primarily about
how developers can develop secure programs.
My presentation, like the book, is at
http://www.dwheeler.com/secure-programs.
Both the BSD license and the LGPL license
limit the freedom of developers, depending on
how you define freedom:
The LGPL license limits the freedom of developers,
because it doesn't let them take the
code and make it proprietary.
The BSD license limits the freedom of developers,
because it doesn't let them use or build on
any changes made by other developers (because others
can always make proprietary changes).
Which is better depends on what you think is
important. But the belief that the BSD license
is "more free", as espoused by some here,
is not a universal notion.
Microsoft products must be as extra-cost options in the purchase of new computers, so that the user who does not wish to purchase them is not forced to do so.
If I choose to not use Microsoft's products, then
Microsoft should not get a cut of my money.
The specifications of Microsoft's present and future document file formats must be made public, so that documents created in Microsoft applications may be read by programs from other makers (in addition to the APIs, already part
of the settlement).
Any Microsoft networking protocols must be published in full and approved by an independent network protocol body. This would prevent Microsoft from seizing de facto control of the Internet.
In addition, I would add that the pricing for
Microsoft's products must be strictly based
on volume (to prevent Microsoft from
"punishing" vendors who sell competing products)
and to make their agreements with resellers
public (to prevent secret agreements from
damaging the public).
I'm not anti-Microsoft.. I just want to make
sure that there is opportunity for competition.
Capitalism, to work effectively, requires
competition.
Go see Thomas Jefferson High School,
Fairfax, VA's
Computer Systems Research
class, where the goals include such things as
"develop computer skills appropriate for summer employment such as proficiency in UNIX (Linux), mastery of languages such as C, C++, Java or Perl, and familiarity with Web technologies and Internet resources."
It's obviously a lot more than simply learning
some skills; there's a great deal of emphasis
on learning general concepts (see the link).
Thomas Jefferson is a magnet school in
Northern Virginia; its students are often
quite extraordinary.
One approach would be for all email readers
to have a nice big "SPAM" button; any time
you get a SPAM message, you just press that
button and all sorts of automatic things
happen. Here are some examples:
The emailer should forge a "failed to
deliver" message, e.g., looking exactly like
that user and/or machine no longer exists.
Some spammers keep "good" email addresses
(e.g., ones that didn't fail) for future use,
and drop the others.. this makes it harder
to keep good lists.
Forward the spam on to sites who keep track
of messages to block. Sadly, some people
will try to label non-spam as spam so that
their email won't get through, so SOMEONE
has to look at the spam.
This is a case where more laws are necessary,
but it's not clear reasonable ones will get
passed soon. One hope - lawmakers are
increasingly having to deal with spam themselves.
Egress filtering, also called
"Network Ingress Filtering", is already
formally defined and described
in IETF RFC 2827.
As most of you know, the IETF defines
the key Internet standards, and the
IETF completed this RFC back in May 2000.
In it, the authors recommend that all
service providers implement egress filtering
"as soon as possible".
You can see this RFC at
http://www.ietf.org/rfc/rfc2827.txt.
It would be a good idea to
legally require ISP's to implement
egress filtering.
It won't stop DDoS attacks, but it would
make it far easier to trace and stop
malicious network activity.
There are also some efforts to try to
"throttle" DDoS attacks from the
sending side (e.g., by watching
to see if there are many unanswered packets
and then slowing down transmission rates).
If these efforts scale and their current
problems can be fixed (e.g., how do you
handle broadcasting?), perhaps they could
be made a legal requirement, or perhaps
there could be a general
legal requirement that ISP's implement
methods to counter DDoS attacks, using
egress filtering and throttling as examples.
There are ways to make this work legally, by
creating a more general law and setting up
a body to create the more specific regulations
(which can be flexible as technology advances and
new attacks emerge).
The fundamental problem with Distributed
Denial of Service (DDoS) attacks is that
they are very hard for victims themselves
to counter; the best place to counter them
is near the attacker, but victims generally
have no control over networks "near" the
attacker. Since DDoS attacks don't
particularly hurt the "sending" ISPs, this
is a problem that will not be solved
by simply waiting for people to do it themselves.
Thus, I think there's a need
for "good Internet citizen" legal requirements
to make DDoS attacks easier to counter.
Get the real report from NSS.
on
Future Of IDS
·
· Score: 2, Informative
You can get the real IDS report
from the NSS group at
http://www.nss.co.uk.
at no charge.
Tell Microsoft-only organizations to
threaten Microsoft, saying
"we'll switch to
open source software (e.g., GNU/Linux)
instead of Microsoft's software."
Organizations that do so might be able to
save a lot of money, even
if they have no intention
of actually making the switch.
Many of these Microsoft-only shops have been
hit with the recent licensing changes that
(for most) increase their costs, and believe
that there's nothing they can do about it.
It looks like Microsoft may be so concerned about
losing business that they may grant all sorts
of price concessions to keep business.
Organizations should develop competitive bidding
strategies (just like they do for many other
purchases), looking at the costs and benefits
of the services they're paying for.
Obviously, organizations are only going to
save a lot of money if they're a credible
threat, e.g., represent a significant account
and have "done their homework" to show that
they really could switch to open source software.
Total cost of ownership (TCO) calculations and
quantitative evidence help here.
Many organizations will find employees who
can really strengthen this analysis through
personal experience (e.g., those who use
such software at home).
If Microsoft wants "exclusive use" clauses,
make sure they're dearly won and for a limited
time (so that the organization can save
lots of money again in a few years).
Even if the organization picks Microsoft
anyway (just as they were going to do),
open sourcers can find amusement in causing
Microsoft's revenue stream to dwindle.
Of course, an organization always runs the
danger of finding out that open source software
is actually the best choice.
In that case, they can find the delight of
a surprise bargain they weren't expecting.
You'll need to discuss at least two situations:
(1) using open source software in your business, and (2) developing/modifying
open source software. Obviously, the issues
are different.
Some relevant URLs (quantitative data, security)
on
Opposing Open Source?
·
· Score: 1
One poster complained that I only reported reliability measures for Apache, and performance measures for TUX. I clearly stated in the paper that I wanted to have the performance measures for Apache (in particular), but I have no such "equivalent numbers." A TUX+Apache combination would be nice too. However, I am not interested in making numbers up. I've tried to ferret out as many numbers as I can, and unfortunately, the research currently available is not the research he/she wanted.
Several readers noted that Zoebelein's 1999
survey only covers a subset of the Internet.
I agree that that needs to be emphasized, thanks
for pointing that out.
I've modified the paper to make that much clearer.
The paper never claims to "prove" that OSS/FS
is always superior.
One person said "How has [Gartner] been discredited? Well nothing really factual. Nobody really did any similar surveys to prove a counter-point." Please read. Gartner is clearly wrong about the supercomputing numbers
("zero"), IDC did a similar survey that DID a counter-point (and came up with significantly
different answers), and follow the money.
Regarding "Urban Legend #2. Mindcraft was biased... Oh yeah, despite the fact that similar results were obtained by other independent benchmarks that showed serious flaws in the Linux kernel.": Keep reading, the paper clearly discusses why the original benchmark was biased (e.g., websites don't typically have that kind of load), that serious performance
flaws in GNU/Linux were found (and later fixed),
and so on.
>... it seems as if the author is speaking in terms of the kernel PLUS all of the userland components (including X,) so the gigabuck figure (and corresponding man-years) is really misleading.
I don't think it's misleading. The paper specifically says that I'm measuring "GNU/Linux" and "an entire Linux distribution". It should be clear that I'm measuring an entire distribution, not just the Linux kernel.
There's a lot of discussion about "what is contained in Linux". Trying to define various groupings could be interesting. I've posted all my data for anyone who wants to identify and study various groupings. For most of today's users, I think such discussions are irrelevant -- to them, the set of CD's labelled "Distribution-X Linux" defines what "Linux" means to them - and that's what I measured. Perhaps in the future the Linux Standard Base will set what a "minimal GNU/Linux system" is.
Don't expect a study of SLOC to tell you whether to call it "GNU/Linux" or "Linux"; I do discuss the issue in section 3.2.
> the study assumes one company/group doing all the development, which further separates the study from reality.
Nowhere does the study assume that. We all know that's not true unless you define the group as "developers of open source and free software." Red Hat did develop some of the code, but only a small fraction of it.
All I'm trying to do is to estimate how much effort went into making this set of software - not who put in the effort.
> Another small nitpick is that Perl is actually licensed under the GPL... Perl is also dual-licensed under the Artistic license... but that doesn't mean that it isn't GPLed.
I don't understand how that is a "nitpick". Look at the data -- you'll see that Perl's license is assigned the value "Artistic or GPL". The paper is already correct.
> Since he didn't count either.xul or.js source files, the figure for Mozilla is much too small.
That's true, I don't count XUL or Javascript, so Mozilla might indeed be underestimated. That would move the total development cost to a cost even greater than a Gigabuck.
Just in passing - it's important to separate "development cost" from value. Something may cost a gigabuck to develop, but have a much greater value to the world at large (from saved costs worldwide, reduced barrier to innovation, lifetime profit, or whatever).
A number of people have made various comments; as the author, I thought I'd respond to some of them. I'll use this single reply, instead of trying to reply in separate posts for each. Original posts are in italicized paragraphs:
> Using RedHat as a distro for this project isn't that good of an idea.... it's just an unrepresentative mass of programs and code! I can safely say that most Redhat users will never use about one-quarter of the programs in their distribution...
That's true for any of today's operating systems. No user uses all the code in Windows, either. Even real-time OS's have more code developed for them than is used by any given user. As a measure of effort, though, examining all the code makes sense.
> Since when is the number of lines of code proportional to the quality of the software? If Red Hat 7.1 has 30 million lines of code over 6.2's 17 million, does that mean the product is 76% better? Is the code getting more sloppy as more programmers get involved? I feel like counsel is leading the witness for the author to say 7.1 has "60% more effort" under the COCOMO model."
I never said it was "better", I said it included "60% more effort." Better is a value judgement. Effort is measured in person-years.
> The kernel shouldn't be two million lines of code. How much of that is drivers? And how much of the drivers are duplicated from one driver to another?
Section 3.2 specifically discusses this; 57% of the lines of code are drivers. Duplicate files are only counted once, but "partly duplicated" files are much harder to detect (and to discount when they happen); they certainly happen in the Linux kernel. However, the COCOMO model is based on real project data, and many other projects include cut-and-pasted code (for good or ill).
> Ok, so this guy claims that Linux would cost a little over $1 billion (US) to develop. I wonder what the big deal is. I'm sure Microsoft has spent that much over the years on Office+Win9x+WinNT+Backoffice+etc... The only thing incredible about this number is that most of that billion was completely unpaid, or at least underpayed.
But I believe that is a big deal.
Gates' "Open Letter to Hobbyists" assumed that if people just shared code, no large project would be developed. GNU/Linux and other open source/free software systems show the assumption wrong, and this paper has the numbers to prove it. You can argue which is "better", of course, but the notion that it can't be done is no longer debatable.
> Are there estimate[s of] how much money in form of salaries were ever paid to programmers for the code and how much was in effect done not only voluntarily, but also completely on an unpaid basis?
Unfortunately not; it's not even clear how to find out. You would have to go back to individual patches submitted to every project, and few people identify in their patches "I was paid to do this."
> 2437470 source lines of code for the Linux kernel. Doesn't that worry some people out there? We have a monolithic kernel almost two and a half million lines long. I think that by 2.6 the kernel is going to collapse under its own weight unless the designers decide to reorganize it in a fundamental way.
It's the nature of a monolithic kernel, and in any case, most of that is in modules (which are individually much smaller and only loaded when needed). I see no evidence of a "collapse", though clearly there are competitors (like HURD) that might eventually replace it in the market.
> Quoting statistics/data going back to '95 is way out of date by todays standards, even '99 is now very old.
It may be old, but it helps give perspective. A simple SLOC number doesn't mean much to people, unless it's compared to something else.
> The cost formula includes a term (ksloc**1.05): i.e. thousands of source lines to the power of 1.05. This reflects the fact that the bigger a program becomes, the harder it is to add new lines, because the system you are adding too is more complex. He plugs the size of the entire code base of RH7.2 into this formula. This seems unreasonable to me - these are many almost independent packages.
No, I don't do that (for the reason you cite). Section 2.3 of the paper discusses this:
"Each build directory had its effort estimation computed separately; the efforts of each were then totalled." Appendix A mentions that sloccount was given the "--multiproject" option, which implements this.
Anyway, I hope people found this study interesting. It sounds like several people did.
Open source/free software systems are clearly demonstrating that, at least for software, it is possible to develop large systems using an approach similar to the scientific approach of sharing discoveries. To some, this is counter-intuitive, but it's still demonstrably true.
I actually measured the number of source lines of code (SLOC) of a GNU/Linux distribution (Red Hat Linux 6.2). I then used those measures to estimate the person-years and dollar costs necessary to build the same system (if it was developed in a proprietary manner). You can see the results in my paper Estimating Linux's Size at http://www.dwheeler.com/sloc. Here's a brief summary:
This Linux distribution includes well over 17 million physical source lines of code (SLOC).
Over 4,500 person-years of development time would have been required to build this distribution by conventional proprietary means.
It would have cost over $600 million (in year 2000 dollars) to develop this distribution in the U.S. using conventional proprietary means.
No doubt newer distributions would be even larger, with even larger costs to develop traditionally. Some distributions include many more packages, and I would expect some of them to have cost over $1 billion (U.S., 2000 dollars) to develop using proprietary means.
A little over half of the lines of code in this distribution are licensed using the GPL license. This includes gcc, emacs, many KDE programs, many GNOME programs, and other software that tends to be included on *BSD as well as GNU/Linux systems. Thus, it makes sense for Microsoft to particularly attack the GPL license: Removing software licensed through the GPL would cripple many systems that compete with Microsoft. And clearly the GPL does not fit into Microsoft's business model. The notion, however, that business models different from Microsoft's model are somehow dangerous and need suppression is -- well -- laughable.
Don't beat up the good guys. DARPA funded all of the early Internet work and a good chuck of BSD work as well. So, indirectly, DARPA has already provided funding to OpenBSD.
And it's nonsense that the U.S. government is actively opposed to open source - for example, NSA just released a Security
Enhanced version of Linux.
DARPA is trying to advance what's already available -
and advances in security would be great.
I suspect they will be able to make advances, since they're planning to spend $10 million on the winning proposals.
As has been noted, OpenBSD is not a perfect solution - its packages are often quite old and it has many functionality limits (e.g., no support for SMP).
It also doesn't meet the principle of "least privilege" - root is still
all-powerful, programs can do anything their
owners can, etc.
The deadline is soon for those interested in submitting a proposal.
The full proposal (all copies) must be submitted in time to reach
DARPA by 4:00 PM (U.S. Eastern Time) Monday, March 5, 2001,
in order to be considered; it CANNOT be sent by email or fax
(they REQUIRE PHYSICAL COPIES).
People interested in submitting a proposal should also
read the
Proposer
Information Pamphlet (PIP), which isn't
easy to find unless you know where it is.
People currently submit their URLs to Google because (1) it's the #1 search engine, and (2) it's free. Now Teoma wants to compete with Google (which has a large database) through a business model which will ensure that Teoma always has an uncompetitive database? Rediculous.
I think Google has the better business model. Charge for advertizing on keywords, and show the ads separately. That way, people don't feel like they're being lied to, and people get the best possible results (without it interfering with the search engine's business model).
Yes, I know that Red Hat isn't _primarily_ marketing to desktops, but even system administrators and and others need to read and edit Word, PowerPoint, and Excel files. For example, there are FAR too many documents (including technical material) that are only available in those formats. KWord is quite ineffective at importing Word, and Abiword can only handle very simple Word documents. Gnumeric does a good job with Excel spreadsheets, but I know of no other open source program that can handle powerpoint files. If you don't want it to use up space on your hard drive, don't install Open Office, but for many it would be a BIG help to have Open Office ready-to-install on the CD's.
There's a whole literature on managing software projects. Look up terms like "Software Engineering" and "Software Management". For tracking progress, the usual approach is to divide the project into a series of steps, where each step can be unambiguously determined to be true or not (no "90% done" steps). Estimate the time that's required for each step and use a scheduling program to determine how long it will take; you'll also need separate management reserve time for the inevitable problems (but keep this separate from the steps, so that you'll know when you're using it up). Some people define dollar values for each step, resulting in earned value approaches.
By the way, I've used SLOC to estimate the effort needed to develop one of the GNU/Linux distributions (Red Hat); you can see the results in More than a Gigabuck: Estimating GNU/Linux's Size .
There are at least two open source software/ free software source code scanners that work like this. My tool Flawfinder does this, as does John Viega's RATS tool. Both tools are licensed under the GPL.
They both work essentially the same way; they use patterns and some heuristics to identify "dangerous" function calls and patterns, and also try to rank their riskiness. They both have built-in databases (so you don't have to figure out what should be looked for), and they both parse the code sufficiently so that comments and data in strings are ignored. They also examine the parameter values to determine the riskiness of the construct. Both were influenced, by the way, by a previous tool called ITS4. Eventually we hope to merge our efforts, but it hasn't been immediately obvious how to do so. In fact, it can be argued that we shouldn't: having two tools is like having two different people look at something, each catches or emphasizes something the other doesn't.
I think running either tool on the entire distribution would result in too much output to be worthwhile. These tools simply identify potentially dangerous code - you still have to look at the code to determine if it's really a problem. My hope, instead, is to convince the various developers of each package to use such tools to find potential problems before the code is released to the public. Don't let me discourage you from trying - please do review what you can!! But I'd like to see everyone reviewing code they work with, not just a few code reviewers.
For more information about other tools, see my book Secure Programming for Linux and Unix HOWTO, Tools section.
I'm a big believer in defense-in-depth strategies. You should use source code scanning tools like these to find problems in your code before you run it. You should then run tools like Purify and Electric Fence to find other problems. Then, use tools and mechanisms that counter security attacks at run-time, e.g., StackGuard, TempGuard, and so on. It would be great if there were a global setting so that you could make ALL programs use the "slow but safe free()" without having to recompile the C library.
You might find my Secure Programming for Linux and Unix HOWTO useful. It's a set of guidelines for writing secure programs, including writing web applications, clients, viewers (including word processors), setuid/setgid programs, and so on. It's focused on Linux and Unix, but most of the general principles apply to all systems.
I suggest that you look at my paper Why Open Source Software / Free Software (OSS/FS)? Look at the Numbers!. It has that kind of information, grouped into categories such as market share, total cost of ownership (TCO), reliability, and so on.
For example, Microsoft absolutely owns the desktop client market, that's true. But it certainly doesn't own other markets - Apache is still the most common web browser, for example, and sendmail is the most popular mail transfer agent (MTA). See my paper for the details.
Total cost of ownership (TCO) is so dependent on the assumptions that you really have to do your own. However, it's clear that many people do find that GNU/Linux systems have a lower TCO than Microsoft's systems in their environment.
Please note that Perens himself claims that the $1.9 billion estimate was only if the software had been developed the same way as Microsoft's. Perens does not claim that $1.9 billion was spent. Check the linked-to paper, I think it spells things out clearly. One caveat: I wrote the analysis tool used in the paper. However, the tool simply implements a well-known and widely respected estimation model that has been openly documented; it's certainly not biased to give open source software bigger results.
I think Perens' article was well-written.
This is already happening. For example, the Linux Documentation Project (LDP) recently added support for Plucker; the LDP now automatically generates Plucker format for all HOWTO, mini-HOWTO, and FAQ documents. The LDP also automatically regenerates the files when the documents are updated. Pluckerbooks has over a thousand pregenerated books and they have links to other sources of Plucker documents.
In fact, I've recently added support for Plucker to my own website. My paper Why Open Source Software / Free Software? Look at the Numbers! also has a Plucker version available. I also generate a Plucker version of my book on writing secure programs. So I'm speaking from experience here.. Plucker works well for at least some content providers!
Downloading the tools and then generating the Plucker format is easy if you can use a command line interface. Plucker's format is essentially compressed HTML, so for most websites it's easy to support. Plucker is GPL'ed, so its components (the generator and reader) can't be "taken away"... and they are free for any use. This combination of free reader, free creator, and no risk (because it can't be taken away) makes Plucker much more appropriate for many content providers. The Plucker viewer itself is quite capable, for example, it supports larger fonts for headings, bold text, italics, hypertext links, images, horizontal rules, and tables (formatted as one cell per line). If you click on a hypertext link to a page not included in the file, Plucker will show you the URL so you can look it up later.
Installing just the viewer is actually quite easy for end-users; you can download just the viewer from the Plucker website, and Plucker users can beam the program to other users of Palm-compatible PDAs. Generating Plucker files is pretty easy from the command line, but I do agree that currently grandma may have trouble generating documents on her own. It's also true that getting "new" versions of Plucker documents isn't automatic; you have to do something to get an update. The Plucker folks are actively working on solving these problems, e.g., creating GUI interfaces. Since Plucker is already a really nice viewer, and other work is already ongoing, I think that the Plucker developers will quickly succeed in making it easier for naive users to generate their own documents.
It's true that currently GPG's user interface is terrible for beginning users if they have to use it directly. So, clearly, you want to use programs that embed GPG (like Evolution). Also, note that the German government is funding further development of GPG. They specifically say that their funding will be used to make GPG more usable by less experienced users, including porting the software to other operating systems, developing graphical user interfaces (GUI) and writing a handbook.
Thus, this sounds like a short-term problem at worst.
I see lots of opinions, but I'd like to see more than that. Has anyone does a real survey of colleges and universities to determine what the "dominant" operating system is for Computer Science departments? A real survey would use standard statistical methods, for example, identifying all the universities and then creating a random sample to evaluate (because self-selected samples are notoriously biased). I haven't seen anything like that, but I sure would like to.
That being said, it is useful to have a background in Calculus, multiple programming languages, etc. Learning these things helps you more quickly absorb other things later, and being a quick study is really important. I program sometimes, and I do use Calculus for some of my work.
And yes, I think that unpaid experience with open source projects will help someone gain a job in developing software. I would certainly consider it as evidence of someone who was willing to go an extra step, and I could even look at their contributions to consider how well that person created code, interacted with others, and so on. But there are many factors, in particular, it'd be better if the open source project was related to the work that the person was applying for. And yes, there are open source software jobs!
I've also just posted my presentation on how to write secure programs; it's the presentation I gave at FOSDEM 2002 last week. Note that these presentations have different (overlapping) goals; Louis Bertrand's presentation is primarily about OpenBSD (e.g., how it's developed), while my presentation is primarily about how developers can develop secure programs. My presentation, like the book, is at http://www.dwheeler.com/secure-programs.
Which is better depends on what you think is important. But the belief that the BSD license is "more free", as espoused by some here, is not a universal notion.
In addition, I would add that the pricing for Microsoft's products must be strictly based on volume (to prevent Microsoft from "punishing" vendors who sell competing products) and to make their agreements with resellers public (to prevent secret agreements from damaging the public).
I'm not anti-Microsoft.. I just want to make sure that there is opportunity for competition. Capitalism, to work effectively, requires competition.
Thomas Jefferson is a magnet school in Northern Virginia; its students are often quite extraordinary.
This is a case where more laws are necessary, but it's not clear reasonable ones will get passed soon. One hope - lawmakers are increasingly having to deal with spam themselves.
It would be a good idea to legally require ISP's to implement egress filtering. It won't stop DDoS attacks, but it would make it far easier to trace and stop malicious network activity.
There are also some efforts to try to "throttle" DDoS attacks from the sending side (e.g., by watching to see if there are many unanswered packets and then slowing down transmission rates). If these efforts scale and their current problems can be fixed (e.g., how do you handle broadcasting?), perhaps they could be made a legal requirement, or perhaps there could be a general legal requirement that ISP's implement methods to counter DDoS attacks, using egress filtering and throttling as examples. There are ways to make this work legally, by creating a more general law and setting up a body to create the more specific regulations (which can be flexible as technology advances and new attacks emerge).
The fundamental problem with Distributed Denial of Service (DDoS) attacks is that they are very hard for victims themselves to counter; the best place to counter them is near the attacker, but victims generally have no control over networks "near" the attacker. Since DDoS attacks don't particularly hurt the "sending" ISPs, this is a problem that will not be solved by simply waiting for people to do it themselves. Thus, I think there's a need for "good Internet citizen" legal requirements to make DDoS attacks easier to counter.
You can get the real IDS report from the NSS group at http://www.nss.co.uk. at no charge.
Tell Microsoft-only organizations to threaten Microsoft, saying "we'll switch to open source software (e.g., GNU/Linux) instead of Microsoft's software." Organizations that do so might be able to save a lot of money, even if they have no intention of actually making the switch.
Many of these Microsoft-only shops have been hit with the recent licensing changes that (for most) increase their costs, and believe that there's nothing they can do about it. It looks like Microsoft may be so concerned about losing business that they may grant all sorts of price concessions to keep business. Organizations should develop competitive bidding strategies (just like they do for many other purchases), looking at the costs and benefits of the services they're paying for.
Obviously, organizations are only going to save a lot of money if they're a credible threat, e.g., represent a significant account and have "done their homework" to show that they really could switch to open source software. Total cost of ownership (TCO) calculations and quantitative evidence help here. Many organizations will find employees who can really strengthen this analysis through personal experience (e.g., those who use such software at home). If Microsoft wants "exclusive use" clauses, make sure they're dearly won and for a limited time (so that the organization can save lots of money again in a few years). Even if the organization picks Microsoft anyway (just as they were going to do), open sourcers can find amusement in causing Microsoft's revenue stream to dwindle.
Of course, an organization always runs the danger of finding out that open source software is actually the best choice. In that case, they can find the delight of a surprise bargain they weren't expecting.
You'll need to discuss at least two situations: (1) using open source software in your business, and (2) developing/modifying open source software. Obviously, the issues are different.
I don't think it's misleading. The paper specifically says that I'm measuring "GNU/Linux" and "an entire Linux distribution". It should be clear that I'm measuring an entire distribution, not just the Linux kernel.
There's a lot of discussion about "what is contained in Linux". Trying to define various groupings could be interesting. I've posted all my data for anyone who wants to identify and study various groupings. For most of today's users, I think such discussions are irrelevant -- to them, the set of CD's labelled "Distribution-X Linux" defines what "Linux" means to them - and that's what I measured. Perhaps in the future the Linux Standard Base will set what a "minimal GNU/Linux system" is. Don't expect a study of SLOC to tell you whether to call it "GNU/Linux" or "Linux"; I do discuss the issue in section 3.2.
> the study assumes one company/group doing all the development, which further separates the study from reality.
Nowhere does the study assume that. We all know that's not true unless you define the group as "developers of open source and free software." Red Hat did develop some of the code, but only a small fraction of it. All I'm trying to do is to estimate how much effort went into making this set of software - not who put in the effort.
> Another small nitpick is that Perl is actually licensed under the GPL... Perl is also dual-licensed under the Artistic license... but that doesn't mean that it isn't GPLed.
I don't understand how that is a "nitpick". Look at the data -- you'll see that Perl's license is assigned the value "Artistic or GPL". The paper is already correct.
> Since he didn't count either .xul or .js source files, the figure for Mozilla is much too small.
That's true, I don't count XUL or Javascript, so Mozilla might indeed be underestimated. That would move the total development cost to a cost even greater than a Gigabuck.
Just in passing - it's important to separate "development cost" from value. Something may cost a gigabuck to develop, but have a much greater value to the world at large (from saved costs worldwide, reduced barrier to innovation, lifetime profit, or whatever).
> Using RedHat as a distro for this project isn't that good of an idea.... it's just an unrepresentative mass of programs and code! I can safely say that most Redhat users will never use about one-quarter of the programs in their distribution...
That's true for any of today's operating systems. No user uses all the code in Windows, either. Even real-time OS's have more code developed for them than is used by any given user. As a measure of effort, though, examining all the code makes sense.
> Since when is the number of lines of code proportional to the quality of the software? If Red Hat 7.1 has 30 million lines of code over 6.2's 17 million, does that mean the product is 76% better? Is the code getting more sloppy as more programmers get involved? I feel like counsel is leading the witness for the author to say 7.1 has "60% more effort" under the COCOMO model."
I never said it was "better", I said it included "60% more effort." Better is a value judgement. Effort is measured in person-years.
> The kernel shouldn't be two million lines of code. How much of that is drivers? And how much of the drivers are duplicated from one driver to another?
Section 3.2 specifically discusses this; 57% of the lines of code are drivers. Duplicate files are only counted once, but "partly duplicated" files are much harder to detect (and to discount when they happen); they certainly happen in the Linux kernel. However, the COCOMO model is based on real project data, and many other projects include cut-and-pasted code (for good or ill).
> Ok, so this guy claims that Linux would cost a little over $1 billion (US) to develop. I wonder what the big deal is. I'm sure Microsoft has spent that much over the years on Office+Win9x+WinNT+Backoffice+etc ... The only thing incredible about this number is that most of that billion was completely unpaid, or at least underpayed.
But I believe that is a big deal. Gates' "Open Letter to Hobbyists" assumed that if people just shared code, no large project would be developed. GNU/Linux and other open source/free software systems show the assumption wrong, and this paper has the numbers to prove it. You can argue which is "better", of course, but the notion that it can't be done is no longer debatable.
> Are there estimate[s of] how much money in form of salaries were ever paid to programmers for the code and how much was in effect done not only voluntarily, but also completely on an unpaid basis?
Unfortunately not; it's not even clear how to find out. You would have to go back to individual patches submitted to every project, and few people identify in their patches "I was paid to do this."
> 2437470 source lines of code for the Linux kernel. Doesn't that worry some people out there? We have a monolithic kernel almost two and a half million lines long. I think that by 2.6 the kernel is going to collapse under its own weight unless the designers decide to reorganize it in a fundamental way.
It's the nature of a monolithic kernel, and in any case, most of that is in modules (which are individually much smaller and only loaded when needed). I see no evidence of a "collapse", though clearly there are competitors (like HURD) that might eventually replace it in the market.
> Quoting statistics/data going back to '95 is way out of date by todays standards, even '99 is now very old.
It may be old, but it helps give perspective. A simple SLOC number doesn't mean much to people, unless it's compared to something else.
> The cost formula includes a term (ksloc**1.05): i.e. thousands of source lines to the power of 1.05. This reflects the fact that the bigger a program becomes, the harder it is to add new lines, because the system you are adding too is more complex. He plugs the size of the entire code base of RH7.2 into this formula. This seems unreasonable to me - these are many almost independent packages.
No, I don't do that (for the reason you cite). Section 2.3 of the paper discusses this: "Each build directory had its effort estimation computed separately; the efforts of each were then totalled." Appendix A mentions that sloccount was given the "--multiproject" option, which implements this.
Anyway, I hope people found this study interesting. It sounds like several people did.
I actually measured the number of source lines of code (SLOC) of a GNU/Linux distribution (Red Hat Linux 6.2). I then used those measures to estimate the person-years and dollar costs necessary to build the same system (if it was developed in a proprietary manner). You can see the results in my paper Estimating Linux's Size at http://www.dwheeler.com/sloc. Here's a brief summary:
No doubt newer distributions would be even larger, with even larger costs to develop traditionally. Some distributions include many more packages, and I would expect some of them to have cost over $1 billion (U.S., 2000 dollars) to develop using proprietary means.
A little over half of the lines of code in this distribution are licensed using the GPL license. This includes gcc, emacs, many KDE programs, many GNOME programs, and other software that tends to be included on *BSD as well as GNU/Linux systems. Thus, it makes sense for Microsoft to particularly attack the GPL license: Removing software licensed through the GPL would cripple many systems that compete with Microsoft. And clearly the GPL does not fit into Microsoft's business model. The notion, however, that business models different from Microsoft's model are somehow dangerous and need suppression is -- well -- laughable.
DARPA is trying to advance what's already available - and advances in security would be great. I suspect they will be able to make advances, since they're planning to spend $10 million on the winning proposals. As has been noted, OpenBSD is not a perfect solution - its packages are often quite old and it has many functionality limits (e.g., no support for SMP). It also doesn't meet the principle of "least privilege" - root is still all-powerful, programs can do anything their owners can, etc.
The deadline is soon for those interested in submitting a proposal. The full proposal (all copies) must be submitted in time to reach DARPA by 4:00 PM (U.S. Eastern Time) Monday, March 5, 2001, in order to be considered; it CANNOT be sent by email or fax (they REQUIRE PHYSICAL COPIES).
People interested in submitting a proposal should also read the Proposer Information Pamphlet (PIP), which isn't easy to find unless you know where it is.