Domain: dwheeler.com
Stories and comments across the archive that link to dwheeler.com.
Comments · 467
-
slashdotted!
This paper analyzes the amount of source code in GNU/Linux, using Red Hat Linux 7.1 as a representative GNU/Linux distribution, and presents what I believe are interesting results.
In particular, it would cost over $1 billion ($1,000 million - a Gigabuck) to develop this GNU/Linux distribution by conventional proprietary means in the U.S. (in year 2000 U.S. dollars). Compare this to the $600 million estimate for Red Hat Linux version 6.2 (which had been released about one year earlier). Also, Red Hat Linux 7.1 includes over 30 million physical source lines of code (SLOC), compared to well over 17 million SLOC in version 6.2. Using the COCOMO cost model, this system is estimated to have required about 8,000 person-years of development time (as compared to 4,500 person-years to develop version 6.2). Thus, Red Hat Linux 7.1 represents over a 60% increase in size, effort, and traditional development costs over Red Hat Linux 6.2. This is due to an increased number of mature and maturing open source / free software programs available worldwide.
Many other interesting statistics emerge. The largest components (in order) were the Linux kernel (including device drivers), Mozilla (Netscape's open source web system including a web browser, email client, and HTML editor), the X Window system (the infrastructure for the graphical user interface), gcc (a compilation system), gdb (for debugging), basic binary tools, emacs (a text editor and far more), LAPACK (a large Fortran library for numerical linear algebra), the Gimp (a bitmapped graphics editor), and MySQL (a relational database system). The languages used, sorted by the most lines of code, were C (71% - was 81%), C++ (15% - was 8%), shell (including ksh), Lisp, assembly, Perl, Fortran, Python, tcl, Java, yacc/bison, expect, lex/flex, awk, Objective-C, Ada, C shell, Pascal, and sed.
The predominant software license is the GNU GPL. Slightly over half of the software is simply licensed using the GPL, and the software packages using the copylefting licenses (the GPL and LGPL), at least in part or as an alternative, accounted for 63% of the code. In all ways, the copylefting licenses (GPL and LGPL) are the dominant licenses in this GNU/Linux distribution. In contrast, only 0.2% of the software is public domain.
This paper is an update of my previous paper on estimating GNU/Linux's size, which measured Red Hat Linux 6.2 [Wheeler 2001]. Since Red Hat Linux 6.2 was released in March 2000, and Red Hat Linux 7.1 was released in April 2001, this paper shows what's changed over approximately one year. More information is available at http://www.dwheeler.com/sloc. 1. Introduction The GNU/Linux operating system has gone from an unknown to a powerful market force. Netcraft found that, of the systems running web servers on June 2001, GNU/Linux was now the second most popular operating system (with 29.6%, versus Windows' 49.6%) [Netcraft 2001]. Another survey, of primarily European and educational sites, found that GNU/Linux was used more than any other operating system (of the sites it surveyed) [Zoebelein 1999]. IDC found that 25% of all server operating systems purchased in 1999 were GNU/Linux, making it second only to Windows NT's 38% [Shankland 2000a].
There appear to be many reasons for this, and not simply because GNU/Linux can be obtained at no or low cost. For example, experiments suggest that GNU/Linux is highly reliable. A 1995 study of a set of individual components found that the GNU and GNU/Linux components had a significantly higher reliability than their proprietary Unix competitors (6% to 9% failure rate with GNU and Linux, versus an average 23% failure rate with the proprietary software using their measurement technique) [Miller 1995]. A ten-month experiment in 1999 by ZDnet found that, while Microsoft's Windows NT crashed every six weeks under a ``typical'' intranet load, using the same load and request set the GNU/Linux systems (from two different distributors) never crashed [Vaughan-Nichols 1999].
However, possibly the most important reason for GNU/Linux's popularity among many developers and users is that its source code is generally ``open source software'' and/or ``free software''. A program that is ``open source software'' or ``free software'' is essentially a program whose source code can be obtained, viewed, changed, and redistributed without royalties or other limitations of these actions. A more formal definition of ``open source software'' is available from the Open Source Initiative [OSI 1999], a more formal definition of ``free software'' (as the term is used in this paper) is available from the Free Software Foundation [FSF 2000], and other general information about these topics is available at Wheeler [2000a]. Quantitative rationales for using open source / free software is given in Wheeler [2000b]. The GNU/Linux operating system is actually a suite of components, including the Linux kernel on which it is based, and it is packaged, sold, and supported by a variety of distributors. The Linux kernel is ``open source software''/``free software'', and this is also true for all (or nearly all) other components of a typical GNU/Linux distribution. Open source software/free software frees users from being captives of a particular vendor, since it permits users to fix any problems immediately, tailor their system, and analyze their software in arbitrary ways.
Surprisingly, although anyone can analyze GNU/Linux for arbitrary properties, I have found little published analysis of the amount of source lines of code (SLOC) contained in a GNU/Linux distribution. Microsoft unintentionally published some analysis data in the documents usually called ``Halloween I'' and ``Halloween II'' [Halloween I] [Halloween II]. Another study focused on the Linux kernel and its growth over time is by Godfrey [2000]; this is an interesting study but it focuses solely on the Linux kernel (not the entire operating system). Paul G. Allen posted some results from running Scientific Toolworks, Inc.'s tools on the Linux kernel, but this analysis only considered C code (including headers) - ignoring the many other languages used in constructing the Linux kernel (e.g., assembly language), and only concentrating on the kernel. The Free Code Graphing Project at http://fcgp.sourceforge.net generates a graphical representation of a program (currently, the Linux kernel), but only of the C code. In a previous paper, I examined Red Hat Linux 6.2 and the numbers from the Halloween papers [Wheeler 2001].
This paper updates my previous paper, showing estimates of the size of one of today's GNU/Linux distributions, and it estimates how much it would cost to rebuild this typical GNU/Linux distribution using traditional software development techniques. Various definitions and assumptions are included, so that others can understand exactly what these numbers mean. I have intentionally written this paper so that you do not need to read the previous version of this paper first.
For my purposes, I have selected as my ``representative'' GNU/Linux distribution Red Hat Linux version 7.1. I believe this distribution is reasonably representative for several reasons:
- Red Hat Linux is the most popular Linux distribution sold in 1999 according to IDC [Shankland 2000b]. Red Hat sold 48% of all copies in 1999; the next largest distribution in market share sales was SuSE (a German distributor) at 15%. Not all GNU/Linux copies are ``sold'' in a way that this study would count, but the study at least shows that Red Hat's distribution is a popular one.
- Many distributions (such as Mandrake) are based on, or were originally developed from, a version of Red Hat Linux. This doesn't mean the other distributions are less capable, but it suggests that these other distributions are likely to have a similar set of components.
- All major general-purpose distributions support (at least) the kind of functionality supported by Red Hat Linux, if for no other reason than to compete with Red Hat.
- All distributors start with the same set of open source software projects from which to choose components to integrate. Therefore, other distributions are likely to choose the same components or similar kinds of components with often similar size for the same kind of functionality.
Different distributions and versions would produce different size figures, but I hope that this paper will be enlightening even though it doesn't try to evaluate ``all'' distributions. Note that some distributions (such as SuSE) may decide to add many more applications, but also note this would only create larger (not smaller) sizes and estimated levels of effort. At the time that I began this project, version 7.1 was the latest version of Red Hat Linux available, so I selected that version for analysis.
Note that Red Hat Linux 6.2 was released on March 2000, Red Hat Linux 7 was released on September 2000 (I have not counted its code), and Red Hat Linux 7.1 was released on April 2001. Thus, the differences between Red Hat Linux 7.1 and 6.2 show differences accrued over 13 months (approximately one year).
Clearly there is far more open source / free software available worldwide than is counted in this paper. However, the job of a distributor is to examine these various options and select software that they believe is both sufficiently mature and useful to their target market. Thus, examining a particular distribution results in a selective analysis of such software.
Section 2 briefly describes the approach used to estimate the ``size'' of this distribution (more details are in Appendix A). Section 3 discusses some of the results. Section 4 presents conclusions, followed by an appendix. GNU/Linux is often called simply ``Linux'', but technically Linux is only the name of the operating system kernel; to eliminate ambiguity this paper uses the term ``GNU/Linux'' as the general name for the whole system and ``Linux kernel'' for just this inner kernel. 2. Approach My basic approach was to:
- install the source code files in uncompressed format; this requires carefully selecting the source code to be analyzed.
- count the number of source lines of code (SLOC); this requires a careful definition of SLOC.
- use an estimation model to estimate the effort and cost of developing the same system in a proprietary manner; this requires an estimation model.
- determine the software licenses of each component and develop statistics based on these categories.
More detail on this approach is described in Appendix A. A few summary points are worth mentioning here, however. 2.1 Selecting Source Code
I included all software provided in the Red Hat distribution, but note that Red Hat no longer includes software packages that only apply to other CPU architectures (and thus packages not applying to the x86 family were excluded). I did not include ``old'' versions of software, or ``beta'' software where non-beta was available. I did include ``beta'' software where there was no alternative, because some developers don't remove the ``beta'' label even when it's widely used and perceived to be reliable.
I used md5 checksums to identify and ignore duplicate files, so if the same file contents appeared in more than one file, it was only counted once (as a tie-breaker, such files are assigned to the first build package it applies to in alphabetic order).
The code in makefiles and Red Hat Package Manager (RPM) specifications was not included. Various heuristics were used to detect automatically generated code, and any such code was also excluded from the count. A number of other heuristics were used to determine if a language was a source program file, and if so, what its language was.
Since different languages have different syntaxes, I could only measure the SLOC for the languages that my tool (sloccount) could detect and handle. The languages sloccount could detect and handle are Ada, Assembly, awk, Bourne shell and variants, C, C++, C shell, Expect, Fortran, Java, lex/flex, LISP/Scheme, Makefile, Objective-C, Pascal, Perl, Python, sed, SQL, TCL, and Yacc/bison. Other languages are not counted; these include XUL (used in Mozilla), Javascript (also in Mozilla), PHP, and Objective Caml (an OO dialect of ML). Also code embedded in data is not counted (e.g., code embedded in HTML files). Some systems use their own built-in languages; in general code in these languages is not counted.
-
slashdotted!
This paper analyzes the amount of source code in GNU/Linux, using Red Hat Linux 7.1 as a representative GNU/Linux distribution, and presents what I believe are interesting results.
In particular, it would cost over $1 billion ($1,000 million - a Gigabuck) to develop this GNU/Linux distribution by conventional proprietary means in the U.S. (in year 2000 U.S. dollars). Compare this to the $600 million estimate for Red Hat Linux version 6.2 (which had been released about one year earlier). Also, Red Hat Linux 7.1 includes over 30 million physical source lines of code (SLOC), compared to well over 17 million SLOC in version 6.2. Using the COCOMO cost model, this system is estimated to have required about 8,000 person-years of development time (as compared to 4,500 person-years to develop version 6.2). Thus, Red Hat Linux 7.1 represents over a 60% increase in size, effort, and traditional development costs over Red Hat Linux 6.2. This is due to an increased number of mature and maturing open source / free software programs available worldwide.
Many other interesting statistics emerge. The largest components (in order) were the Linux kernel (including device drivers), Mozilla (Netscape's open source web system including a web browser, email client, and HTML editor), the X Window system (the infrastructure for the graphical user interface), gcc (a compilation system), gdb (for debugging), basic binary tools, emacs (a text editor and far more), LAPACK (a large Fortran library for numerical linear algebra), the Gimp (a bitmapped graphics editor), and MySQL (a relational database system). The languages used, sorted by the most lines of code, were C (71% - was 81%), C++ (15% - was 8%), shell (including ksh), Lisp, assembly, Perl, Fortran, Python, tcl, Java, yacc/bison, expect, lex/flex, awk, Objective-C, Ada, C shell, Pascal, and sed.
The predominant software license is the GNU GPL. Slightly over half of the software is simply licensed using the GPL, and the software packages using the copylefting licenses (the GPL and LGPL), at least in part or as an alternative, accounted for 63% of the code. In all ways, the copylefting licenses (GPL and LGPL) are the dominant licenses in this GNU/Linux distribution. In contrast, only 0.2% of the software is public domain.
This paper is an update of my previous paper on estimating GNU/Linux's size, which measured Red Hat Linux 6.2 [Wheeler 2001]. Since Red Hat Linux 6.2 was released in March 2000, and Red Hat Linux 7.1 was released in April 2001, this paper shows what's changed over approximately one year. More information is available at http://www.dwheeler.com/sloc. 1. Introduction The GNU/Linux operating system has gone from an unknown to a powerful market force. Netcraft found that, of the systems running web servers on June 2001, GNU/Linux was now the second most popular operating system (with 29.6%, versus Windows' 49.6%) [Netcraft 2001]. Another survey, of primarily European and educational sites, found that GNU/Linux was used more than any other operating system (of the sites it surveyed) [Zoebelein 1999]. IDC found that 25% of all server operating systems purchased in 1999 were GNU/Linux, making it second only to Windows NT's 38% [Shankland 2000a].
There appear to be many reasons for this, and not simply because GNU/Linux can be obtained at no or low cost. For example, experiments suggest that GNU/Linux is highly reliable. A 1995 study of a set of individual components found that the GNU and GNU/Linux components had a significantly higher reliability than their proprietary Unix competitors (6% to 9% failure rate with GNU and Linux, versus an average 23% failure rate with the proprietary software using their measurement technique) [Miller 1995]. A ten-month experiment in 1999 by ZDnet found that, while Microsoft's Windows NT crashed every six weeks under a ``typical'' intranet load, using the same load and request set the GNU/Linux systems (from two different distributors) never crashed [Vaughan-Nichols 1999].
However, possibly the most important reason for GNU/Linux's popularity among many developers and users is that its source code is generally ``open source software'' and/or ``free software''. A program that is ``open source software'' or ``free software'' is essentially a program whose source code can be obtained, viewed, changed, and redistributed without royalties or other limitations of these actions. A more formal definition of ``open source software'' is available from the Open Source Initiative [OSI 1999], a more formal definition of ``free software'' (as the term is used in this paper) is available from the Free Software Foundation [FSF 2000], and other general information about these topics is available at Wheeler [2000a]. Quantitative rationales for using open source / free software is given in Wheeler [2000b]. The GNU/Linux operating system is actually a suite of components, including the Linux kernel on which it is based, and it is packaged, sold, and supported by a variety of distributors. The Linux kernel is ``open source software''/``free software'', and this is also true for all (or nearly all) other components of a typical GNU/Linux distribution. Open source software/free software frees users from being captives of a particular vendor, since it permits users to fix any problems immediately, tailor their system, and analyze their software in arbitrary ways.
Surprisingly, although anyone can analyze GNU/Linux for arbitrary properties, I have found little published analysis of the amount of source lines of code (SLOC) contained in a GNU/Linux distribution. Microsoft unintentionally published some analysis data in the documents usually called ``Halloween I'' and ``Halloween II'' [Halloween I] [Halloween II]. Another study focused on the Linux kernel and its growth over time is by Godfrey [2000]; this is an interesting study but it focuses solely on the Linux kernel (not the entire operating system). Paul G. Allen posted some results from running Scientific Toolworks, Inc.'s tools on the Linux kernel, but this analysis only considered C code (including headers) - ignoring the many other languages used in constructing the Linux kernel (e.g., assembly language), and only concentrating on the kernel. The Free Code Graphing Project at http://fcgp.sourceforge.net generates a graphical representation of a program (currently, the Linux kernel), but only of the C code. In a previous paper, I examined Red Hat Linux 6.2 and the numbers from the Halloween papers [Wheeler 2001].
This paper updates my previous paper, showing estimates of the size of one of today's GNU/Linux distributions, and it estimates how much it would cost to rebuild this typical GNU/Linux distribution using traditional software development techniques. Various definitions and assumptions are included, so that others can understand exactly what these numbers mean. I have intentionally written this paper so that you do not need to read the previous version of this paper first.
For my purposes, I have selected as my ``representative'' GNU/Linux distribution Red Hat Linux version 7.1. I believe this distribution is reasonably representative for several reasons:
- Red Hat Linux is the most popular Linux distribution sold in 1999 according to IDC [Shankland 2000b]. Red Hat sold 48% of all copies in 1999; the next largest distribution in market share sales was SuSE (a German distributor) at 15%. Not all GNU/Linux copies are ``sold'' in a way that this study would count, but the study at least shows that Red Hat's distribution is a popular one.
- Many distributions (such as Mandrake) are based on, or were originally developed from, a version of Red Hat Linux. This doesn't mean the other distributions are less capable, but it suggests that these other distributions are likely to have a similar set of components.
- All major general-purpose distributions support (at least) the kind of functionality supported by Red Hat Linux, if for no other reason than to compete with Red Hat.
- All distributors start with the same set of open source software projects from which to choose components to integrate. Therefore, other distributions are likely to choose the same components or similar kinds of components with often similar size for the same kind of functionality.
Different distributions and versions would produce different size figures, but I hope that this paper will be enlightening even though it doesn't try to evaluate ``all'' distributions. Note that some distributions (such as SuSE) may decide to add many more applications, but also note this would only create larger (not smaller) sizes and estimated levels of effort. At the time that I began this project, version 7.1 was the latest version of Red Hat Linux available, so I selected that version for analysis.
Note that Red Hat Linux 6.2 was released on March 2000, Red Hat Linux 7 was released on September 2000 (I have not counted its code), and Red Hat Linux 7.1 was released on April 2001. Thus, the differences between Red Hat Linux 7.1 and 6.2 show differences accrued over 13 months (approximately one year).
Clearly there is far more open source / free software available worldwide than is counted in this paper. However, the job of a distributor is to examine these various options and select software that they believe is both sufficiently mature and useful to their target market. Thus, examining a particular distribution results in a selective analysis of such software.
Section 2 briefly describes the approach used to estimate the ``size'' of this distribution (more details are in Appendix A). Section 3 discusses some of the results. Section 4 presents conclusions, followed by an appendix. GNU/Linux is often called simply ``Linux'', but technically Linux is only the name of the operating system kernel; to eliminate ambiguity this paper uses the term ``GNU/Linux'' as the general name for the whole system and ``Linux kernel'' for just this inner kernel. 2. Approach My basic approach was to:
- install the source code files in uncompressed format; this requires carefully selecting the source code to be analyzed.
- count the number of source lines of code (SLOC); this requires a careful definition of SLOC.
- use an estimation model to estimate the effort and cost of developing the same system in a proprietary manner; this requires an estimation model.
- determine the software licenses of each component and develop statistics based on these categories.
More detail on this approach is described in Appendix A. A few summary points are worth mentioning here, however. 2.1 Selecting Source Code
I included all software provided in the Red Hat distribution, but note that Red Hat no longer includes software packages that only apply to other CPU architectures (and thus packages not applying to the x86 family were excluded). I did not include ``old'' versions of software, or ``beta'' software where non-beta was available. I did include ``beta'' software where there was no alternative, because some developers don't remove the ``beta'' label even when it's widely used and perceived to be reliable.
I used md5 checksums to identify and ignore duplicate files, so if the same file contents appeared in more than one file, it was only counted once (as a tie-breaker, such files are assigned to the first build package it applies to in alphabetic order).
The code in makefiles and Red Hat Package Manager (RPM) specifications was not included. Various heuristics were used to detect automatically generated code, and any such code was also excluded from the count. A number of other heuristics were used to determine if a language was a source program file, and if so, what its language was.
Since different languages have different syntaxes, I could only measure the SLOC for the languages that my tool (sloccount) could detect and handle. The languages sloccount could detect and handle are Ada, Assembly, awk, Bourne shell and variants, C, C++, C shell, Expect, Fortran, Java, lex/flex, LISP/Scheme, Makefile, Objective-C, Pascal, Perl, Python, sed, SQL, TCL, and Yacc/bison. Other languages are not counted; these include XUL (used in Mozilla), Javascript (also in Mozilla), PHP, and Objective Caml (an OO dialect of ML). Also code embedded in data is not counted (e.g., code embedded in HTML files). Some systems use their own built-in languages; in general code in these languages is not counted.
-
slashdotted!
This paper analyzes the amount of source code in GNU/Linux, using Red Hat Linux 7.1 as a representative GNU/Linux distribution, and presents what I believe are interesting results.
In particular, it would cost over $1 billion ($1,000 million - a Gigabuck) to develop this GNU/Linux distribution by conventional proprietary means in the U.S. (in year 2000 U.S. dollars). Compare this to the $600 million estimate for Red Hat Linux version 6.2 (which had been released about one year earlier). Also, Red Hat Linux 7.1 includes over 30 million physical source lines of code (SLOC), compared to well over 17 million SLOC in version 6.2. Using the COCOMO cost model, this system is estimated to have required about 8,000 person-years of development time (as compared to 4,500 person-years to develop version 6.2). Thus, Red Hat Linux 7.1 represents over a 60% increase in size, effort, and traditional development costs over Red Hat Linux 6.2. This is due to an increased number of mature and maturing open source / free software programs available worldwide.
Many other interesting statistics emerge. The largest components (in order) were the Linux kernel (including device drivers), Mozilla (Netscape's open source web system including a web browser, email client, and HTML editor), the X Window system (the infrastructure for the graphical user interface), gcc (a compilation system), gdb (for debugging), basic binary tools, emacs (a text editor and far more), LAPACK (a large Fortran library for numerical linear algebra), the Gimp (a bitmapped graphics editor), and MySQL (a relational database system). The languages used, sorted by the most lines of code, were C (71% - was 81%), C++ (15% - was 8%), shell (including ksh), Lisp, assembly, Perl, Fortran, Python, tcl, Java, yacc/bison, expect, lex/flex, awk, Objective-C, Ada, C shell, Pascal, and sed.
The predominant software license is the GNU GPL. Slightly over half of the software is simply licensed using the GPL, and the software packages using the copylefting licenses (the GPL and LGPL), at least in part or as an alternative, accounted for 63% of the code. In all ways, the copylefting licenses (GPL and LGPL) are the dominant licenses in this GNU/Linux distribution. In contrast, only 0.2% of the software is public domain.
This paper is an update of my previous paper on estimating GNU/Linux's size, which measured Red Hat Linux 6.2 [Wheeler 2001]. Since Red Hat Linux 6.2 was released in March 2000, and Red Hat Linux 7.1 was released in April 2001, this paper shows what's changed over approximately one year. More information is available at http://www.dwheeler.com/sloc. 1. Introduction The GNU/Linux operating system has gone from an unknown to a powerful market force. Netcraft found that, of the systems running web servers on June 2001, GNU/Linux was now the second most popular operating system (with 29.6%, versus Windows' 49.6%) [Netcraft 2001]. Another survey, of primarily European and educational sites, found that GNU/Linux was used more than any other operating system (of the sites it surveyed) [Zoebelein 1999]. IDC found that 25% of all server operating systems purchased in 1999 were GNU/Linux, making it second only to Windows NT's 38% [Shankland 2000a].
There appear to be many reasons for this, and not simply because GNU/Linux can be obtained at no or low cost. For example, experiments suggest that GNU/Linux is highly reliable. A 1995 study of a set of individual components found that the GNU and GNU/Linux components had a significantly higher reliability than their proprietary Unix competitors (6% to 9% failure rate with GNU and Linux, versus an average 23% failure rate with the proprietary software using their measurement technique) [Miller 1995]. A ten-month experiment in 1999 by ZDnet found that, while Microsoft's Windows NT crashed every six weeks under a ``typical'' intranet load, using the same load and request set the GNU/Linux systems (from two different distributors) never crashed [Vaughan-Nichols 1999].
However, possibly the most important reason for GNU/Linux's popularity among many developers and users is that its source code is generally ``open source software'' and/or ``free software''. A program that is ``open source software'' or ``free software'' is essentially a program whose source code can be obtained, viewed, changed, and redistributed without royalties or other limitations of these actions. A more formal definition of ``open source software'' is available from the Open Source Initiative [OSI 1999], a more formal definition of ``free software'' (as the term is used in this paper) is available from the Free Software Foundation [FSF 2000], and other general information about these topics is available at Wheeler [2000a]. Quantitative rationales for using open source / free software is given in Wheeler [2000b]. The GNU/Linux operating system is actually a suite of components, including the Linux kernel on which it is based, and it is packaged, sold, and supported by a variety of distributors. The Linux kernel is ``open source software''/``free software'', and this is also true for all (or nearly all) other components of a typical GNU/Linux distribution. Open source software/free software frees users from being captives of a particular vendor, since it permits users to fix any problems immediately, tailor their system, and analyze their software in arbitrary ways.
Surprisingly, although anyone can analyze GNU/Linux for arbitrary properties, I have found little published analysis of the amount of source lines of code (SLOC) contained in a GNU/Linux distribution. Microsoft unintentionally published some analysis data in the documents usually called ``Halloween I'' and ``Halloween II'' [Halloween I] [Halloween II]. Another study focused on the Linux kernel and its growth over time is by Godfrey [2000]; this is an interesting study but it focuses solely on the Linux kernel (not the entire operating system). Paul G. Allen posted some results from running Scientific Toolworks, Inc.'s tools on the Linux kernel, but this analysis only considered C code (including headers) - ignoring the many other languages used in constructing the Linux kernel (e.g., assembly language), and only concentrating on the kernel. The Free Code Graphing Project at http://fcgp.sourceforge.net generates a graphical representation of a program (currently, the Linux kernel), but only of the C code. In a previous paper, I examined Red Hat Linux 6.2 and the numbers from the Halloween papers [Wheeler 2001].
This paper updates my previous paper, showing estimates of the size of one of today's GNU/Linux distributions, and it estimates how much it would cost to rebuild this typical GNU/Linux distribution using traditional software development techniques. Various definitions and assumptions are included, so that others can understand exactly what these numbers mean. I have intentionally written this paper so that you do not need to read the previous version of this paper first.
For my purposes, I have selected as my ``representative'' GNU/Linux distribution Red Hat Linux version 7.1. I believe this distribution is reasonably representative for several reasons:
- Red Hat Linux is the most popular Linux distribution sold in 1999 according to IDC [Shankland 2000b]. Red Hat sold 48% of all copies in 1999; the next largest distribution in market share sales was SuSE (a German distributor) at 15%. Not all GNU/Linux copies are ``sold'' in a way that this study would count, but the study at least shows that Red Hat's distribution is a popular one.
- Many distributions (such as Mandrake) are based on, or were originally developed from, a version of Red Hat Linux. This doesn't mean the other distributions are less capable, but it suggests that these other distributions are likely to have a similar set of components.
- All major general-purpose distributions support (at least) the kind of functionality supported by Red Hat Linux, if for no other reason than to compete with Red Hat.
- All distributors start with the same set of open source software projects from which to choose components to integrate. Therefore, other distributions are likely to choose the same components or similar kinds of components with often similar size for the same kind of functionality.
Different distributions and versions would produce different size figures, but I hope that this paper will be enlightening even though it doesn't try to evaluate ``all'' distributions. Note that some distributions (such as SuSE) may decide to add many more applications, but also note this would only create larger (not smaller) sizes and estimated levels of effort. At the time that I began this project, version 7.1 was the latest version of Red Hat Linux available, so I selected that version for analysis.
Note that Red Hat Linux 6.2 was released on March 2000, Red Hat Linux 7 was released on September 2000 (I have not counted its code), and Red Hat Linux 7.1 was released on April 2001. Thus, the differences between Red Hat Linux 7.1 and 6.2 show differences accrued over 13 months (approximately one year).
Clearly there is far more open source / free software available worldwide than is counted in this paper. However, the job of a distributor is to examine these various options and select software that they believe is both sufficiently mature and useful to their target market. Thus, examining a particular distribution results in a selective analysis of such software.
Section 2 briefly describes the approach used to estimate the ``size'' of this distribution (more details are in Appendix A). Section 3 discusses some of the results. Section 4 presents conclusions, followed by an appendix. GNU/Linux is often called simply ``Linux'', but technically Linux is only the name of the operating system kernel; to eliminate ambiguity this paper uses the term ``GNU/Linux'' as the general name for the whole system and ``Linux kernel'' for just this inner kernel. 2. Approach My basic approach was to:
- install the source code files in uncompressed format; this requires carefully selecting the source code to be analyzed.
- count the number of source lines of code (SLOC); this requires a careful definition of SLOC.
- use an estimation model to estimate the effort and cost of developing the same system in a proprietary manner; this requires an estimation model.
- determine the software licenses of each component and develop statistics based on these categories.
More detail on this approach is described in Appendix A. A few summary points are worth mentioning here, however. 2.1 Selecting Source Code
I included all software provided in the Red Hat distribution, but note that Red Hat no longer includes software packages that only apply to other CPU architectures (and thus packages not applying to the x86 family were excluded). I did not include ``old'' versions of software, or ``beta'' software where non-beta was available. I did include ``beta'' software where there was no alternative, because some developers don't remove the ``beta'' label even when it's widely used and perceived to be reliable.
I used md5 checksums to identify and ignore duplicate files, so if the same file contents appeared in more than one file, it was only counted once (as a tie-breaker, such files are assigned to the first build package it applies to in alphabetic order).
The code in makefiles and Red Hat Package Manager (RPM) specifications was not included. Various heuristics were used to detect automatically generated code, and any such code was also excluded from the count. A number of other heuristics were used to determine if a language was a source program file, and if so, what its language was.
Since different languages have different syntaxes, I could only measure the SLOC for the languages that my tool (sloccount) could detect and handle. The languages sloccount could detect and handle are Ada, Assembly, awk, Bourne shell and variants, C, C++, C shell, Expect, Fortran, Java, lex/flex, LISP/Scheme, Makefile, Objective-C, Pascal, Perl, Python, sed, SQL, TCL, and Yacc/bison. Other languages are not counted; these include XUL (used in Mozilla), Javascript (also in Mozilla), PHP, and Objective Caml (an OO dialect of ML). Also code embedded in data is not counted (e.g., code embedded in HTML files). Some systems use their own built-in languages; in general code in these languages is not counted.
-
slashdotted!
This paper analyzes the amount of source code in GNU/Linux, using Red Hat Linux 7.1 as a representative GNU/Linux distribution, and presents what I believe are interesting results.
In particular, it would cost over $1 billion ($1,000 million - a Gigabuck) to develop this GNU/Linux distribution by conventional proprietary means in the U.S. (in year 2000 U.S. dollars). Compare this to the $600 million estimate for Red Hat Linux version 6.2 (which had been released about one year earlier). Also, Red Hat Linux 7.1 includes over 30 million physical source lines of code (SLOC), compared to well over 17 million SLOC in version 6.2. Using the COCOMO cost model, this system is estimated to have required about 8,000 person-years of development time (as compared to 4,500 person-years to develop version 6.2). Thus, Red Hat Linux 7.1 represents over a 60% increase in size, effort, and traditional development costs over Red Hat Linux 6.2. This is due to an increased number of mature and maturing open source / free software programs available worldwide.
Many other interesting statistics emerge. The largest components (in order) were the Linux kernel (including device drivers), Mozilla (Netscape's open source web system including a web browser, email client, and HTML editor), the X Window system (the infrastructure for the graphical user interface), gcc (a compilation system), gdb (for debugging), basic binary tools, emacs (a text editor and far more), LAPACK (a large Fortran library for numerical linear algebra), the Gimp (a bitmapped graphics editor), and MySQL (a relational database system). The languages used, sorted by the most lines of code, were C (71% - was 81%), C++ (15% - was 8%), shell (including ksh), Lisp, assembly, Perl, Fortran, Python, tcl, Java, yacc/bison, expect, lex/flex, awk, Objective-C, Ada, C shell, Pascal, and sed.
The predominant software license is the GNU GPL. Slightly over half of the software is simply licensed using the GPL, and the software packages using the copylefting licenses (the GPL and LGPL), at least in part or as an alternative, accounted for 63% of the code. In all ways, the copylefting licenses (GPL and LGPL) are the dominant licenses in this GNU/Linux distribution. In contrast, only 0.2% of the software is public domain.
This paper is an update of my previous paper on estimating GNU/Linux's size, which measured Red Hat Linux 6.2 [Wheeler 2001]. Since Red Hat Linux 6.2 was released in March 2000, and Red Hat Linux 7.1 was released in April 2001, this paper shows what's changed over approximately one year. More information is available at http://www.dwheeler.com/sloc. 1. Introduction The GNU/Linux operating system has gone from an unknown to a powerful market force. Netcraft found that, of the systems running web servers on June 2001, GNU/Linux was now the second most popular operating system (with 29.6%, versus Windows' 49.6%) [Netcraft 2001]. Another survey, of primarily European and educational sites, found that GNU/Linux was used more than any other operating system (of the sites it surveyed) [Zoebelein 1999]. IDC found that 25% of all server operating systems purchased in 1999 were GNU/Linux, making it second only to Windows NT's 38% [Shankland 2000a].
There appear to be many reasons for this, and not simply because GNU/Linux can be obtained at no or low cost. For example, experiments suggest that GNU/Linux is highly reliable. A 1995 study of a set of individual components found that the GNU and GNU/Linux components had a significantly higher reliability than their proprietary Unix competitors (6% to 9% failure rate with GNU and Linux, versus an average 23% failure rate with the proprietary software using their measurement technique) [Miller 1995]. A ten-month experiment in 1999 by ZDnet found that, while Microsoft's Windows NT crashed every six weeks under a ``typical'' intranet load, using the same load and request set the GNU/Linux systems (from two different distributors) never crashed [Vaughan-Nichols 1999].
However, possibly the most important reason for GNU/Linux's popularity among many developers and users is that its source code is generally ``open source software'' and/or ``free software''. A program that is ``open source software'' or ``free software'' is essentially a program whose source code can be obtained, viewed, changed, and redistributed without royalties or other limitations of these actions. A more formal definition of ``open source software'' is available from the Open Source Initiative [OSI 1999], a more formal definition of ``free software'' (as the term is used in this paper) is available from the Free Software Foundation [FSF 2000], and other general information about these topics is available at Wheeler [2000a]. Quantitative rationales for using open source / free software is given in Wheeler [2000b]. The GNU/Linux operating system is actually a suite of components, including the Linux kernel on which it is based, and it is packaged, sold, and supported by a variety of distributors. The Linux kernel is ``open source software''/``free software'', and this is also true for all (or nearly all) other components of a typical GNU/Linux distribution. Open source software/free software frees users from being captives of a particular vendor, since it permits users to fix any problems immediately, tailor their system, and analyze their software in arbitrary ways.
Surprisingly, although anyone can analyze GNU/Linux for arbitrary properties, I have found little published analysis of the amount of source lines of code (SLOC) contained in a GNU/Linux distribution. Microsoft unintentionally published some analysis data in the documents usually called ``Halloween I'' and ``Halloween II'' [Halloween I] [Halloween II]. Another study focused on the Linux kernel and its growth over time is by Godfrey [2000]; this is an interesting study but it focuses solely on the Linux kernel (not the entire operating system). Paul G. Allen posted some results from running Scientific Toolworks, Inc.'s tools on the Linux kernel, but this analysis only considered C code (including headers) - ignoring the many other languages used in constructing the Linux kernel (e.g., assembly language), and only concentrating on the kernel. The Free Code Graphing Project at http://fcgp.sourceforge.net generates a graphical representation of a program (currently, the Linux kernel), but only of the C code. In a previous paper, I examined Red Hat Linux 6.2 and the numbers from the Halloween papers [Wheeler 2001].
This paper updates my previous paper, showing estimates of the size of one of today's GNU/Linux distributions, and it estimates how much it would cost to rebuild this typical GNU/Linux distribution using traditional software development techniques. Various definitions and assumptions are included, so that others can understand exactly what these numbers mean. I have intentionally written this paper so that you do not need to read the previous version of this paper first.
For my purposes, I have selected as my ``representative'' GNU/Linux distribution Red Hat Linux version 7.1. I believe this distribution is reasonably representative for several reasons:
- Red Hat Linux is the most popular Linux distribution sold in 1999 according to IDC [Shankland 2000b]. Red Hat sold 48% of all copies in 1999; the next largest distribution in market share sales was SuSE (a German distributor) at 15%. Not all GNU/Linux copies are ``sold'' in a way that this study would count, but the study at least shows that Red Hat's distribution is a popular one.
- Many distributions (such as Mandrake) are based on, or were originally developed from, a version of Red Hat Linux. This doesn't mean the other distributions are less capable, but it suggests that these other distributions are likely to have a similar set of components.
- All major general-purpose distributions support (at least) the kind of functionality supported by Red Hat Linux, if for no other reason than to compete with Red Hat.
- All distributors start with the same set of open source software projects from which to choose components to integrate. Therefore, other distributions are likely to choose the same components or similar kinds of components with often similar size for the same kind of functionality.
Different distributions and versions would produce different size figures, but I hope that this paper will be enlightening even though it doesn't try to evaluate ``all'' distributions. Note that some distributions (such as SuSE) may decide to add many more applications, but also note this would only create larger (not smaller) sizes and estimated levels of effort. At the time that I began this project, version 7.1 was the latest version of Red Hat Linux available, so I selected that version for analysis.
Note that Red Hat Linux 6.2 was released on March 2000, Red Hat Linux 7 was released on September 2000 (I have not counted its code), and Red Hat Linux 7.1 was released on April 2001. Thus, the differences between Red Hat Linux 7.1 and 6.2 show differences accrued over 13 months (approximately one year).
Clearly there is far more open source / free software available worldwide than is counted in this paper. However, the job of a distributor is to examine these various options and select software that they believe is both sufficiently mature and useful to their target market. Thus, examining a particular distribution results in a selective analysis of such software.
Section 2 briefly describes the approach used to estimate the ``size'' of this distribution (more details are in Appendix A). Section 3 discusses some of the results. Section 4 presents conclusions, followed by an appendix. GNU/Linux is often called simply ``Linux'', but technically Linux is only the name of the operating system kernel; to eliminate ambiguity this paper uses the term ``GNU/Linux'' as the general name for the whole system and ``Linux kernel'' for just this inner kernel. 2. Approach My basic approach was to:
- install the source code files in uncompressed format; this requires carefully selecting the source code to be analyzed.
- count the number of source lines of code (SLOC); this requires a careful definition of SLOC.
- use an estimation model to estimate the effort and cost of developing the same system in a proprietary manner; this requires an estimation model.
- determine the software licenses of each component and develop statistics based on these categories.
More detail on this approach is described in Appendix A. A few summary points are worth mentioning here, however. 2.1 Selecting Source Code
I included all software provided in the Red Hat distribution, but note that Red Hat no longer includes software packages that only apply to other CPU architectures (and thus packages not applying to the x86 family were excluded). I did not include ``old'' versions of software, or ``beta'' software where non-beta was available. I did include ``beta'' software where there was no alternative, because some developers don't remove the ``beta'' label even when it's widely used and perceived to be reliable.
I used md5 checksums to identify and ignore duplicate files, so if the same file contents appeared in more than one file, it was only counted once (as a tie-breaker, such files are assigned to the first build package it applies to in alphabetic order).
The code in makefiles and Red Hat Package Manager (RPM) specifications was not included. Various heuristics were used to detect automatically generated code, and any such code was also excluded from the count. A number of other heuristics were used to determine if a language was a source program file, and if so, what its language was.
Since different languages have different syntaxes, I could only measure the SLOC for the languages that my tool (sloccount) could detect and handle. The languages sloccount could detect and handle are Ada, Assembly, awk, Bourne shell and variants, C, C++, C shell, Expect, Fortran, Java, lex/flex, LISP/Scheme, Makefile, Objective-C, Pascal, Perl, Python, sed, SQL, TCL, and Yacc/bison. Other languages are not counted; these include XUL (used in Mozilla), Javascript (also in Mozilla), PHP, and Objective Caml (an OO dialect of ML). Also code embedded in data is not counted (e.g., code embedded in HTML files). Some systems use their own built-in languages; in general code in these languages is not counted.
-
Have fun! (Mathematical recreations)As others have noted, how you approach learning math partly depends on what you plan to do with it. But if part of your purpose is to have fun, then I suggest having fun as part of the process!
There are lots of "mathematical recreations" and "math puzzles" that are fun to try solving, in the same way that it can be fun solving other puzzles. And sometimes you may see a variation on that puzzle that's fun (and truly new). Not all of them are truly critical from the point of view of furthering the advancement of mathematics, but they help develop the mind, and if your purpose is to have fun, start now!
For example, I learned about the ``four fours'' problem as a kid (using exactly 4 fours, create legal mathematical expressions to compute 0, 1, 2, 3, etc.). Recently I created a definitive list of answers for the four fours problem. I also played with various really weird bases. Will these change the universe? No. But in the process I learned more than I knew before, and I enjoyed the process.
If nothing else, if you enjoy the process, you're more likely to continue doing it.
-
Have fun! (Mathematical recreations)As others have noted, how you approach learning math partly depends on what you plan to do with it. But if part of your purpose is to have fun, then I suggest having fun as part of the process!
There are lots of "mathematical recreations" and "math puzzles" that are fun to try solving, in the same way that it can be fun solving other puzzles. And sometimes you may see a variation on that puzzle that's fun (and truly new). Not all of them are truly critical from the point of view of furthering the advancement of mathematics, but they help develop the mind, and if your purpose is to have fun, start now!
For example, I learned about the ``four fours'' problem as a kid (using exactly 4 fours, create legal mathematical expressions to compute 0, 1, 2, 3, etc.). Recently I created a definitive list of answers for the four fours problem. I also played with various really weird bases. Will these change the universe? No. But in the process I learned more than I knew before, and I enjoyed the process.
If nothing else, if you enjoy the process, you're more likely to continue doing it.
-
Re:Trivial Slashdot News While Cities Revolt
You make it sound like there are citizen uprisings in those cities. It turns out that the article you link to is merely about the LEADERS (not even the citizens) of a few cities passing a few silly resolutions that don't have the effect of undermining the government's authority one bit.
AND NOW FOR SOMETHING COMPLETELY ON-TOPIC.
I'd rather read the trivial news about SSH. The company mentioned that "doesn't use freeware" is just being stupid. The FSF has a link to a paper that debunks their fear about OpenSSH and other open-source "freeware" being "unsupported".
Also, like nearly everyone else, I recommend PuTTY if you need a Windows SSH client. Too bad that it's "freeware" too. I guess your company will have to settle for an inferior proprietary alternative.
-
Yes and NoYes, the admin is the most important piece of the puzzle in the security game, but by no means is the admin the *only* piece. For example, Microsoft had nearly three times as many security recess days in 1999 as Red Hat. The best admin in the world can't do anything about that. Security is always done in layers, and it makes no sense to say that you can or should ignore one layer because you should implement another layer.
A good analogy is automobiles (somehow it always is). To be safe, be a good driver. No question about that. However, being a good driver isn't enough if you buy tires that suddenly blow out. You also need a safe machine.
Furthermore, it's not realistic to assume that all admins can be pros. I wish they were, they should be, but the small-business person who sets up a few machines on a shoestring budget can't be expected to be an expert LAN admin *and* also good at whatever his/her business is. Like it or not, people set up computers and LAN's under those conditions. Thay'll gravitate towards the systems that support that environment the best.
Finally, I believe that in the long run, Linux will encourage professional administrating *more* than Windows. With Linux it is easier to buy a shoestring-budget support contract. A small business can set up a few Linux boxes and hire a pro to administrate and update them remotely. The support people will need to make very few on-site calls, and have fewer bugs to fix overall. It's just easier (ergo cheaper) to admin Linux.
-
How about free books available online?
Let's turn this topic around a bit and collect links to free books that can be found on the net. My favourites are:
- Dive Into Python - an excellent Python book aimed at experienced programmers
- Thinking in Java - concentrates on OOP principles. Check out Thinking in Python/C++/C# on the same site
- Secure Programming for Linux and Unix HOWTO - calls itself a HOWTO but it's practically a book
- Linux From Scratch - build your own linux distribution
-
A very brief rebuttal... (with text)A very brief rebuttal can be found in my "Look at the Numbers!" paper; see http://www.dwheeler.com/oss_fs_why.html#adti; I also include links to other rebuttals.
In one place, ADTI claimed I said something I didn't say, and in others ADTI intentionally carefully quotes only part of what I said.
-
BCG Study - yes, a lot are paidA good place to start is this recent survey "BCG Study Highlights Factors Contributing to Success of Open Source Software". There is a copy of the sides for the talk in PDF format.
Actually a lot of people writing the software are employed to provide software based solutions. Open source development and free ( GPL/LGPL ) licensing provide a very productive way of encoraging participation in collaborative development. It can provide better solutions to the use of proprietary close source packages.
See Why Open Source Software / Free Software (OSS/FS)? Look at the Numbers!
90% of programmers don't work on creating shrink wrap software but on customising solutions for clients.
From a personal perspective it is far more intellectually rewarding to the joint developer/user. You really can know exactly how the damm thing works and you can in most cases fix or adapt it to your own, your client or your employers needs. Do you wish to live and work in an enviroment where every damm box has the lable "No Serviceable Components Inside"?
As for free GPL/LGPL licensing; the reality of the current employment market is that jobs come and go - BUT, you can take the knowledge you have gain though developing and adapting free licensed software and approach other users of that software for either employment or as clients. You DONT have to "start from scratch" with each job.
If you are a programmer, in the long run, the open source free licensed software model makes it easier for you to remain employed. Unless, that is, your sole career plan consists of being employed by Microsoft.
Another question, how many of those programmers expect to use the open source they contibute at their current and future places of employment?
-
Local Vs Remote & The smaller window of exposuI have read a lot of Gene's work. But I am not sure of the particular presentation you are talking about. Here is Gene Spafford home page, could you tell me which particular presentation you are refering to?
I wonder if he took into account the difference between remotely exploitable and locally explotable vulnerability?
I also wonder if he took into consideration the Window of Exposure between the discovery of the vulnerability and the release of the patch?
See Closing the Window of Exposure by Bruce Schneier , the security section of David Wheeler's "Why Open Source Software / Free Software (OSS/FS)? Look at the Numbers! and also again visit the disproportionately high number of open vulnerabilities in its Internet Explorer.
-
Also discussed in "Secure Programming" HOWTO
This issue was also discussed in my book Secure Programming for Linux and Unix HOWTO. Look at the section on semantic attacks.
-
Also discussed in "Secure Programming" HOWTO
This issue was also discussed in my book Secure Programming for Linux and Unix HOWTO. Look at the section on semantic attacks.
-
Lines of code
I consulted Google on this point, and found this link, containing some interesting data on lines of code in RedHat 7.1.
From the statistics given therein, it should probably be called Xfree/Mozilla/Gnu/Linux using RMS's logic. I was amused to see Mozilla so high in the list. I was also amused to see the Linux kernel is, in fact, apparently the largest chunk of code.
-
Open source and security - some referencesAh yes, the "our APIs and code must be secret or the U.S. will crumble" defense. This is a particularly absurd claim for application programmer interfaces (APIs) - by definition, APIs are disclosed to other developers, so the only reason to "hide" them is to prevent competition. Oddly enough, the products where source code (not just the APIs) is visible have lots of quantitative evidence that they're more secure.
It's already been revealed that some attacker got into Microsoft's network. Also, CD's with Microsoft's source have been released for various reasons over time. I have no trouble believing that some "bad guys" already have the source code. So, how do the rest of us protect ourselves from these bad guys with the source code? And from the bad guys to come who don't have it yet... but will?
As noted in Secure Programming for Linux and Unix HOWTO, section 2.4.2, closing off source code doesn't actually halt attacks anyway. Here's the quote:
It's been argued that a system without source code is more secure because, since there's less information available for an attacker, it should be harder for an attacker to find the vulnerabilities. This argument has a number of weaknesses, however, because although source code is extremely important when trying to add new capabilities to a program, attackers generally don't need source code to find a vulnerability.
First, it's important to distinguish between ``destructive'' acts and ``constructive'' acts. In the real world, it is much easier to destroy a car than to build one. In the software world, it is much easier to find and exploit a vulnerability than to add new significant new functionality to that software. Attackers have many advantages against defenders because of this difference. Software developers must try to have no security-relevant mistakes anywhere in their code, while attackers only need to find one. Developers are primarily paid to get their programs to work... attackers don't need to make the program work, they only need to find a single weakness. And as I'll describe in a moment, it takes less information to attack a program than to modify one.
Generally attackers (against both open and closed programs) start by knowing about the general kinds of security problems programs have. There's no point in hiding this information; it's already out, and in any case, defenders need that kind of information to defend themselves. Attackers then use techniques to try to find those problems; I'll group the techniques into ``dynamic'' techniques (where you run the program) and ``static'' techniques (where you examine the program's code - be it source code or machine code).
In ``dynamic'' approaches, an attacker runs the program, sending it data (often problematic data), and sees if the programs' response indicates a common vulnerability. Open and closed programs have no difference here, since the attacker isn't looking at code. Attackers may also look at the code, the ``static'' approach. For open source software, they'll probably look at the source code and search it for patterns. For closed source software, they might search the machine code (usually presented in assembly language format to simplify the task) for essentially the same patterns. They might also use tools called ``decompilers'' that turn the machine code back into source code and then search the source code for the vulnerable patterns (the same way they would search for vulnerabilities in open source software). See Flake [2001] for one discussion of how closed code can still be examined for security vulnerabilities (e.g., using disassemblers). This point is important: even if an attacker wanted to use source code to find a vulnerability, a closed source program has no advantage, because the attacker can use a disassembler to re-create the source code of the product.
Non-developers might ask ``if decompilers can create source code from machine code, then why do developers say they need source code instead of just machine code?'' The problem is that although developers don't need source code to find security problems, developers do need source code to make substantial improvements to the program. Although decompilers can turn machine code back into a ``source code'' of sorts, the resulting source code is extremely hard to modify. Typically most understandable names are lost, so instead of variables like ``grand_total'' you get ``x123123'', instead of methods like ``display_warning'' you get ``f123124'', and the code itself may have spatterings of assembly in it. Also, _ALL_ comments and design information are lost. This isn't a serious problem for finding security problems, because generally you're searching for patterns indicating vulnerabilities, not for internal variable or method names. Thus, decompilers can be useful for finding ways to attack programs, but aren't helpful for updating programs.
Thus, developers will say ``source code is vital'' (when they intend to add functionality), but the fact that the source code for closed source programs is hidden doesn't protect the program very much.
-
Open source and security - some referencesAh yes, the "our APIs and code must be secret or the U.S. will crumble" defense. This is a particularly absurd claim for application programmer interfaces (APIs) - by definition, APIs are disclosed to other developers, so the only reason to "hide" them is to prevent competition. Oddly enough, the products where source code (not just the APIs) is visible have lots of quantitative evidence that they're more secure.
It's already been revealed that some attacker got into Microsoft's network. Also, CD's with Microsoft's source have been released for various reasons over time. I have no trouble believing that some "bad guys" already have the source code. So, how do the rest of us protect ourselves from these bad guys with the source code? And from the bad guys to come who don't have it yet... but will?
As noted in Secure Programming for Linux and Unix HOWTO, section 2.4.2, closing off source code doesn't actually halt attacks anyway. Here's the quote:
It's been argued that a system without source code is more secure because, since there's less information available for an attacker, it should be harder for an attacker to find the vulnerabilities. This argument has a number of weaknesses, however, because although source code is extremely important when trying to add new capabilities to a program, attackers generally don't need source code to find a vulnerability.
First, it's important to distinguish between ``destructive'' acts and ``constructive'' acts. In the real world, it is much easier to destroy a car than to build one. In the software world, it is much easier to find and exploit a vulnerability than to add new significant new functionality to that software. Attackers have many advantages against defenders because of this difference. Software developers must try to have no security-relevant mistakes anywhere in their code, while attackers only need to find one. Developers are primarily paid to get their programs to work... attackers don't need to make the program work, they only need to find a single weakness. And as I'll describe in a moment, it takes less information to attack a program than to modify one.
Generally attackers (against both open and closed programs) start by knowing about the general kinds of security problems programs have. There's no point in hiding this information; it's already out, and in any case, defenders need that kind of information to defend themselves. Attackers then use techniques to try to find those problems; I'll group the techniques into ``dynamic'' techniques (where you run the program) and ``static'' techniques (where you examine the program's code - be it source code or machine code).
In ``dynamic'' approaches, an attacker runs the program, sending it data (often problematic data), and sees if the programs' response indicates a common vulnerability. Open and closed programs have no difference here, since the attacker isn't looking at code. Attackers may also look at the code, the ``static'' approach. For open source software, they'll probably look at the source code and search it for patterns. For closed source software, they might search the machine code (usually presented in assembly language format to simplify the task) for essentially the same patterns. They might also use tools called ``decompilers'' that turn the machine code back into source code and then search the source code for the vulnerable patterns (the same way they would search for vulnerabilities in open source software). See Flake [2001] for one discussion of how closed code can still be examined for security vulnerabilities (e.g., using disassemblers). This point is important: even if an attacker wanted to use source code to find a vulnerability, a closed source program has no advantage, because the attacker can use a disassembler to re-create the source code of the product.
Non-developers might ask ``if decompilers can create source code from machine code, then why do developers say they need source code instead of just machine code?'' The problem is that although developers don't need source code to find security problems, developers do need source code to make substantial improvements to the program. Although decompilers can turn machine code back into a ``source code'' of sorts, the resulting source code is extremely hard to modify. Typically most understandable names are lost, so instead of variables like ``grand_total'' you get ``x123123'', instead of methods like ``display_warning'' you get ``f123124'', and the code itself may have spatterings of assembly in it. Also, _ALL_ comments and design information are lost. This isn't a serious problem for finding security problems, because generally you're searching for patterns indicating vulnerabilities, not for internal variable or method names. Thus, decompilers can be useful for finding ways to attack programs, but aren't helpful for updating programs.
Thus, developers will say ``source code is vital'' (when they intend to add functionality), but the fact that the source code for closed source programs is hidden doesn't protect the program very much.
-
Open source and security - some referencesAh yes, the "our APIs and code must be secret or the U.S. will crumble" defense. This is a particularly absurd claim for application programmer interfaces (APIs) - by definition, APIs are disclosed to other developers, so the only reason to "hide" them is to prevent competition. Oddly enough, the products where source code (not just the APIs) is visible have lots of quantitative evidence that they're more secure.
It's already been revealed that some attacker got into Microsoft's network. Also, CD's with Microsoft's source have been released for various reasons over time. I have no trouble believing that some "bad guys" already have the source code. So, how do the rest of us protect ourselves from these bad guys with the source code? And from the bad guys to come who don't have it yet... but will?
As noted in Secure Programming for Linux and Unix HOWTO, section 2.4.2, closing off source code doesn't actually halt attacks anyway. Here's the quote:
It's been argued that a system without source code is more secure because, since there's less information available for an attacker, it should be harder for an attacker to find the vulnerabilities. This argument has a number of weaknesses, however, because although source code is extremely important when trying to add new capabilities to a program, attackers generally don't need source code to find a vulnerability.
First, it's important to distinguish between ``destructive'' acts and ``constructive'' acts. In the real world, it is much easier to destroy a car than to build one. In the software world, it is much easier to find and exploit a vulnerability than to add new significant new functionality to that software. Attackers have many advantages against defenders because of this difference. Software developers must try to have no security-relevant mistakes anywhere in their code, while attackers only need to find one. Developers are primarily paid to get their programs to work... attackers don't need to make the program work, they only need to find a single weakness. And as I'll describe in a moment, it takes less information to attack a program than to modify one.
Generally attackers (against both open and closed programs) start by knowing about the general kinds of security problems programs have. There's no point in hiding this information; it's already out, and in any case, defenders need that kind of information to defend themselves. Attackers then use techniques to try to find those problems; I'll group the techniques into ``dynamic'' techniques (where you run the program) and ``static'' techniques (where you examine the program's code - be it source code or machine code).
In ``dynamic'' approaches, an attacker runs the program, sending it data (often problematic data), and sees if the programs' response indicates a common vulnerability. Open and closed programs have no difference here, since the attacker isn't looking at code. Attackers may also look at the code, the ``static'' approach. For open source software, they'll probably look at the source code and search it for patterns. For closed source software, they might search the machine code (usually presented in assembly language format to simplify the task) for essentially the same patterns. They might also use tools called ``decompilers'' that turn the machine code back into source code and then search the source code for the vulnerable patterns (the same way they would search for vulnerabilities in open source software). See Flake [2001] for one discussion of how closed code can still be examined for security vulnerabilities (e.g., using disassemblers). This point is important: even if an attacker wanted to use source code to find a vulnerability, a closed source program has no advantage, because the attacker can use a disassembler to re-create the source code of the product.
Non-developers might ask ``if decompilers can create source code from machine code, then why do developers say they need source code instead of just machine code?'' The problem is that although developers don't need source code to find security problems, developers do need source code to make substantial improvements to the program. Although decompilers can turn machine code back into a ``source code'' of sorts, the resulting source code is extremely hard to modify. Typically most understandable names are lost, so instead of variables like ``grand_total'' you get ``x123123'', instead of methods like ``display_warning'' you get ``f123124'', and the code itself may have spatterings of assembly in it. Also, _ALL_ comments and design information are lost. This isn't a serious problem for finding security problems, because generally you're searching for patterns indicating vulnerabilities, not for internal variable or method names. Thus, decompilers can be useful for finding ways to attack programs, but aren't helpful for updating programs.
Thus, developers will say ``source code is vital'' (when they intend to add functionality), but the fact that the source code for closed source programs is hidden doesn't protect the program very much.
-
Forget itThe question is so full of holes it really seems to be a troll.
"A very large programming project" is not one that would take four programmers six months to complete. According to Counting Source Lines of Code, Linux represents some 8,000 person-years of development time.
Why would a "small programming team [of 5]" be given "a very large programming project"? It's your job to point out to your boss why this is doomed to failure.
You have been given "deep pockets" to hire "four or five" Indian programmers? Let's be generous, and say that you're paying each Indian programmer $30K a year. That's a budget of $75K for the six month job. That's "deep pockets" to you? $75K is your idea of a budget for a "very large programming project"?
Given all this, I don't think you are qualified to manage a software engineering project, especially a very challenging one that involves remote development.
-
Nonsense! Quantitative OSS/FS data, AES processIf hiding all the protocols and APIs is necessary to make software more secure, how come there are so many evidences that open source software/free software (OSS/FS) is, at least in some cases, more secure that proprietary programs? A list of quantitative measures, showing that (at least in many cases) OSS/FS is more secure than proprietary software, is at http://www.dwheeler.com/oss_fs_why.html#security.
That's NOT to say that OSS/FS is automatically more secure. But even proprietary vendors often describe their APIs and protocols, without claiming that this information will cause security problems.
Hiding the APIs and protocols has little hope in making a program secure if the program is widely available to attackers anyway. Attackers will just examine the software directly. What secures programs is diligence by the developers, combined with serious security review by independent people who know how to review software. Trying to hide the APIs and protocols is just begging for trouble, because then you won't get much help from the "good guys".
The cryptographic community learned this years ago; look at the process that was used to develop the Advanced Encryption Standard (AES). Clearly an encryption standard is critical for security, yet the standard was publicly analyzed for quite some time.
-
GNU _is_ the largest single contributorIf you're curious about the "amount of code" provided by the GNU project to a typical (GNU/)Linux distribution, take a look here: http://www.dwheeler.com/sloc. In particular, look at section 3.2 of the version that looked at Red Hat Linux 7.1. Here's the gist:
The data here can be used to justify calling the system either ``Linux'' or ``GNU/Linux.'' It's clear that the largest single component in the operating system is the Linux kernel, so it's at least understandable how so many people have chosen to name the entire system after its largest single component (``Linux''). It's also clear that there are many contributors, not just the GNU project itself, and some of those contributors do not agree with the GNU project's philosophy. On the other hand, many of the largest components of the system are essentially GNU projects: gcc, gdb, emacs, binutils (a set of commands for binary files), and glibc (the C library). Other GNU projects in the system include binutils, bash, gawk, make, textutils, sh-utils, gettext, readline, automake, tar, less, findutils, diffutils, and grep... In short, the total of the GNU project's code is much larger than the Linux kernel's size.
The paper has the details to back it up. At that time, the Linux kernel was 2,437 KSLOC (thousands of physical lines of code). But gcc had 984 KSLOC, gdb 967 KSLOC, binutils 691 KSLOC, glibc 647 KSLOC, emacs 628 KSLOC, and so on. See the paper for details. GNU's contribution in terms of effort is, in aggregate, much larger.
Of course, it's a separate question as to whether or not the term ``GNU/Linux'' is a good term. It is clearly awkward to write and speak, and that's a very serious problem. Other postings have already hashed that to death here.
-
Re:Physical Security
Yes, you're right. Sometimes you can even use a backdoor password. I remember that password AMI worked for every AmiBIOS some time ago (extremely stupid idea, once someone knows such a password, every system can be compromised). There's a lot of interesting articles on the Web about cracking BIOS passwords:
- HOW TO BYPASS BIOS PASSWORDS by Elf Qrin
- How to Bypass BIOS Passwords by LabMice.net
- BIOS Password Recovery by Password Crackers, Inc.
A Google search for BIOS Passwords gives quite a few hits. Putting your floppy into the drive is the fastest and easiest thing you can do if you have physical access, but it's not the only issue. No one should ever be allowed to be near the important servers, except people responsible for the security.
Somehow off-topic, but speaking about security, I have to recommend one of the best texts about security (mostly about secure programming) I've ever read: Secure Programming for Linux and Unix HOWTO by David A. Wheeler. Great read. And speaking about passwords, it's good to read great publications of Alec Muffett, the author of the famous crack(1) and CrackLib:
- Security FAQ
- Proper Care and Feeding of Firewalls
- WAN-Hacking with AutoHack (plus slides)
- How To Build Your Own Network Intrusion Kit (readme)
- Programming Holes that will hose your System Security
- Crack FAQ
- CrackLib README
- Crack Humour
It's maybe not very on-topic when speaking about physical security, but it's very important to understand the security as a whole.
-
Security as a processJamesSharman hit the nail on the head-- if you don't get your sysadmin staff up on security and get management's buy-in then you'll be needing an audit every day just to keep things secure.
The first step (really!) is to get a security policy in place. This really doesn't have to be anything special-- but it does need the buy-in of ALL groups affected (sysadmins, developers, marketing, sales, executives, etc.) That's really the only hard part.
Probably the quickest way to get started is to head to the SANS security policy project and adapt their sample policies to your company. This is one of those rare cases where it's more important to get something in than it is to get it right the first time. Policies can be changed fairly easily-- but you don't want to go to all the trouble to implement a secure environment only to have someone on the inside fighting you every step of the way.
Now the fun part-- actually securing your systems. Here are some pointers on places to start:
1) Review the SANS "top 10" security vulnerabilities and make sure they're covered.
2) Review Lance Spitz's excellent collection of host security information and make sure to follow his recommendations.
3) Make sure your firewall rules are set up with the security best practice of "minimum access to get the job done". Far too many firewalls allow traffic they shouldn't.
4) Get NMAP, a network mapper, port scanner, and OS identifier and run it from the Internet to your exposed (i.e. DMZ) hosts. Also run it from your exposed hosts to your internal network to validate that only the traffic that should get in can get in. (The traffic allowed back in from your DMZ should be very little, preferably none.) If you find anything that is inconsistent with what you think should be happening, check your firewall rules again.
5) Grab a copy of the Nessus security scanner and run it against your newly secured systems. If it finds anything, read the description of the problem and see if it's something you can fix. You can bet that everything you find here will also show up on your "security audit" since most "audits" are just someone running a tool like this and then feeding the output to the consultants to make it all pretty for management.
6) You should have most of the obvious, widespread holes plugged by now. This would be a good time to get some sysadmins out to some classes. Verisign has a number of excellent general Internet security classes. I'm sure there are lots of other good places, too. I was pleased with Verisign because of their Internet focus. Too many security classes only concentrate on host security and neglect network security.
7) Get the application developers at your site to read and follow Dave Wheeler's writing secure programs guidelines. This is a lower priority than OS/network security since these holes are likely to be specific to your site only. Only a determined hacker is likely to find and exploit them-- however exploiting application bugs/holes can severely disrupt your business. What happens when an electronic data interchange transaction gets bogus data inserted? How far will that bogus information make it in before it's detected? In the worst case these bugs could result in people getting free products/subscriptions, stealing credit card info, or destroying data inside your systems.
8) Now it's time to get that audit. They will be able to tell you what you missed in the previous 7 steps. Why wait so long? Most places will keep looking until they find something to report. If you do this too soon, the subtle security problems will be lost in the noise of all the obvious problems the previous 7 steps would have fixed. If you do this last, only the "hard" problems are left for them to find.
Remember above all that this is an ongoing process. Keep current on your patches, and repeat all the above steps regularly to keep all the bad guys away.
-
Re:Music Patents vs Software patentsI think that RMS is saying that if you keep the number of parts identical, that the effort involved in the development of a physical thing is much greater than the effort involved in developing a software project with the same number of parts.
Let's use the car for example. How many parts are in a car? 10,000? 100,000? I don't think it's 1,000,000, but I could be wrong. Let's just say 100,000 for argument's sake. How many people are involved in the development and production of a 100,000 part car? I would think that it's safe to say at least 1000 people. But you might know better than I.
If we assume, for the sake of argument, that each line of code is a "software part". How many people are involved in a 100,000 line software project? Using a program called sloccount I was able to count up the number of real lines of source code in qmail 1.03. This project is basically a one man project. It contains approximately 17,000 lines of source code.
If these are numbers are to believed as reasonable numbers. Then it takes about 5-6 people to produce the and assemble a software project with the same number of parts as a physical item that requires upwards of thousands of people to produce.
The offshoot is that, part for part, software is easier to develop. Consequently we make software much more complicated, with many many more parts than we make in physical objects. This leads to a situation that to make any useful software at all, you rely on techniques used in previous software. To prevent a developer from using those techniques (as software patents do) is to prevent any further innovation in software.
I think RMS puts forth a pretty compelling argument.
-
SLOC measures effort, not productivity.This article starts from a misunderstanding, and then "discovers" that the misunderstanding isn't true. Yes, source lines of code (SLOC) aren't good measures of productivity; that's because they weren't intended to measure productivity. SLOC are useful for estimating development effort. The best programmers manage to simplify problems so that they can solve the same problem with less effort. SLOC can then be used to estimate that effort, before it's expended, or used to estimate the effort that was expended. Claiming that SLOC measures productivity is silly.
There's a whole literature on managing software projects. Look up terms like "Software Engineering" and "Software Management". For tracking progress, the usual approach is to divide the project into a series of steps, where each step can be unambiguously determined to be true or not (no "90% done" steps). Estimate the time that's required for each step and use a scheduling program to determine how long it will take; you'll also need separate management reserve time for the inevitable problems (but keep this separate from the steps, so that you'll know when you're using it up). Some people define dollar values for each step, resulting in earned value approaches.
By the way, I've used SLOC to estimate the effort needed to develop one of the GNU/Linux distributions (Red Hat); you can see the results in More than a Gigabuck: Estimating GNU/Linux's Size .
-
Re:Dumb security question
How feasible would it be for someone to take a computer and have it do nothing but pattern-matching through all the source code in a typical Linux distribution, looking specifically for problem areas like these?
Short answer: That's not so easy.For longer answer, read this:
- Secure Programming: Buffer Overflow by David Wheeler
- Smashing The Stack For Fun And Profit by Aleph One
- Buffer Overruns, whats the real story? by Lefty
- Finding and exploiting programs with buffer overflows by Prym
- Stack Smashing Security Vulnerabilities by Nathan Smith
- Buffer Overflows by The FreeBSD Documentation Project
- Linux/ix86 buffer overflows by Willy Tarreau
- SunOS 4.1/Sparc buffer overflows by Willy Tarreau
- The Tao of Windows Buffer Overflow
- Buffer Overflows: Why, How and Prevention by Nicole LaRock Decker
-
Re:Dumb security questionYES. It's definitely possible to use tools to search through code and find problem areas. Such tools are called "source code scanners."
There are at least two open source software/ free software source code scanners that work like this. My tool Flawfinder does this, as does John Viega's RATS tool. Both tools are licensed under the GPL.
They both work essentially the same way; they use patterns and some heuristics to identify "dangerous" function calls and patterns, and also try to rank their riskiness. They both have built-in databases (so you don't have to figure out what should be looked for), and they both parse the code sufficiently so that comments and data in strings are ignored. They also examine the parameter values to determine the riskiness of the construct. Both were influenced, by the way, by a previous tool called ITS4. Eventually we hope to merge our efforts, but it hasn't been immediately obvious how to do so. In fact, it can be argued that we shouldn't: having two tools is like having two different people look at something, each catches or emphasizes something the other doesn't.
I think running either tool on the entire distribution would result in too much output to be worthwhile. These tools simply identify potentially dangerous code - you still have to look at the code to determine if it's really a problem. My hope, instead, is to convince the various developers of each package to use such tools to find potential problems before the code is released to the public. Don't let me discourage you from trying - please do review what you can!! But I'd like to see everyone reviewing code they work with, not just a few code reviewers.
For more information about other tools, see my book Secure Programming for Linux and Unix HOWTO, Tools section.
I'm a big believer in defense-in-depth strategies. You should use source code scanning tools like these to find problems in your code before you run it. You should then run tools like Purify and Electric Fence to find other problems. Then, use tools and mechanisms that counter security attacks at run-time, e.g., StackGuard, TempGuard, and so on. It would be great if there were a global setting so that you could make ALL programs use the "slow but safe free()" without having to recompile the C library.
-
Re:Dumb security questionYES. It's definitely possible to use tools to search through code and find problem areas. Such tools are called "source code scanners."
There are at least two open source software/ free software source code scanners that work like this. My tool Flawfinder does this, as does John Viega's RATS tool. Both tools are licensed under the GPL.
They both work essentially the same way; they use patterns and some heuristics to identify "dangerous" function calls and patterns, and also try to rank their riskiness. They both have built-in databases (so you don't have to figure out what should be looked for), and they both parse the code sufficiently so that comments and data in strings are ignored. They also examine the parameter values to determine the riskiness of the construct. Both were influenced, by the way, by a previous tool called ITS4. Eventually we hope to merge our efforts, but it hasn't been immediately obvious how to do so. In fact, it can be argued that we shouldn't: having two tools is like having two different people look at something, each catches or emphasizes something the other doesn't.
I think running either tool on the entire distribution would result in too much output to be worthwhile. These tools simply identify potentially dangerous code - you still have to look at the code to determine if it's really a problem. My hope, instead, is to convince the various developers of each package to use such tools to find potential problems before the code is released to the public. Don't let me discourage you from trying - please do review what you can!! But I'd like to see everyone reviewing code they work with, not just a few code reviewers.
For more information about other tools, see my book Secure Programming for Linux and Unix HOWTO, Tools section.
I'm a big believer in defense-in-depth strategies. You should use source code scanning tools like these to find problems in your code before you run it. You should then run tools like Purify and Electric Fence to find other problems. Then, use tools and mechanisms that counter security attacks at run-time, e.g., StackGuard, TempGuard, and so on. It would be great if there were a global setting so that you could make ALL programs use the "slow but safe free()" without having to recompile the C library.
-
Guidelines for writing secure programs (HOWTO)
You might find my Secure Programming for Linux and Unix HOWTO useful. It's a set of guidelines for writing secure programs, including writing web applications, clients, viewers (including word processors), setuid/setgid programs, and so on. It's focused on Linux and Unix, but most of the general principles apply to all systems.
-
More about the Numbers...Various posts have wondered if there are TCO figures, or market share numbers, or claimed that Microsoft "owns" all the markets it competes in, or commented on the $1.9 billion figure in Perens' article.
I suggest that you look at my paper Why Open Source Software / Free Software (OSS/FS)? Look at the Numbers!. It has that kind of information, grouped into categories such as market share, total cost of ownership (TCO), reliability, and so on.
For example, Microsoft absolutely owns the desktop client market, that's true. But it certainly doesn't own other markets - Apache is still the most common web browser, for example, and sendmail is the most popular mail transfer agent (MTA). See my paper for the details.
Total cost of ownership (TCO) is so dependent on the assumptions that you really have to do your own. However, it's clear that many people do find that GNU/Linux systems have a lower TCO than Microsoft's systems in their environment.
Please note that Perens himself claims that the $1.9 billion estimate was only if the software had been developed the same way as Microsoft's. Perens does not claim that $1.9 billion was spent. Check the linked-to paper, I think it spells things out clearly. One caveat: I wrote the analysis tool used in the paper. However, the tool simply implements a well-known and widely respected estimation model that has been openly documented; it's certainly not biased to give open source software bigger results.
I think Perens' article was well-written.
-
Content providers: generate Plucker format.Plucker is a very good solution to the problem. If you're a content provider and want to support Palms, just generate the Plucker format yourself. That way, users don't have to figure out how to generate the format; they just download and synchronize.
This is already happening. For example, the Linux Documentation Project (LDP) recently added support for Plucker; the LDP now automatically generates Plucker format for all HOWTO, mini-HOWTO, and FAQ documents. The LDP also automatically regenerates the files when the documents are updated. Pluckerbooks has over a thousand pregenerated books and they have links to other sources of Plucker documents.
In fact, I've recently added support for Plucker to my own website. My paper Why Open Source Software / Free Software? Look at the Numbers! also has a Plucker version available. I also generate a Plucker version of my book on writing secure programs. So I'm speaking from experience here.. Plucker works well for at least some content providers!
Downloading the tools and then generating the Plucker format is easy if you can use a command line interface. Plucker's format is essentially compressed HTML, so for most websites it's easy to support. Plucker is GPL'ed, so its components (the generator and reader) can't be "taken away"... and they are free for any use. This combination of free reader, free creator, and no risk (because it can't be taken away) makes Plucker much more appropriate for many content providers. The Plucker viewer itself is quite capable, for example, it supports larger fonts for headings, bold text, italics, hypertext links, images, horizontal rules, and tables (formatted as one cell per line). If you click on a hypertext link to a page not included in the file, Plucker will show you the URL so you can look it up later.
Installing just the viewer is actually quite easy for end-users; you can download just the viewer from the Plucker website, and Plucker users can beam the program to other users of Palm-compatible PDAs. Generating Plucker files is pretty easy from the command line, but I do agree that currently grandma may have trouble generating documents on her own. It's also true that getting "new" versions of Plucker documents isn't automatic; you have to do something to get an update. The Plucker folks are actively working on solving these problems, e.g., creating GUI interfaces. Since Plucker is already a really nice viewer, and other work is already ongoing, I think that the Plucker developers will quickly succeed in making it easier for naive users to generate their own documents.
-
Content providers: generate Plucker format.Plucker is a very good solution to the problem. If you're a content provider and want to support Palms, just generate the Plucker format yourself. That way, users don't have to figure out how to generate the format; they just download and synchronize.
This is already happening. For example, the Linux Documentation Project (LDP) recently added support for Plucker; the LDP now automatically generates Plucker format for all HOWTO, mini-HOWTO, and FAQ documents. The LDP also automatically regenerates the files when the documents are updated. Pluckerbooks has over a thousand pregenerated books and they have links to other sources of Plucker documents.
In fact, I've recently added support for Plucker to my own website. My paper Why Open Source Software / Free Software? Look at the Numbers! also has a Plucker version available. I also generate a Plucker version of my book on writing secure programs. So I'm speaking from experience here.. Plucker works well for at least some content providers!
Downloading the tools and then generating the Plucker format is easy if you can use a command line interface. Plucker's format is essentially compressed HTML, so for most websites it's easy to support. Plucker is GPL'ed, so its components (the generator and reader) can't be "taken away"... and they are free for any use. This combination of free reader, free creator, and no risk (because it can't be taken away) makes Plucker much more appropriate for many content providers. The Plucker viewer itself is quite capable, for example, it supports larger fonts for headings, bold text, italics, hypertext links, images, horizontal rules, and tables (formatted as one cell per line). If you click on a hypertext link to a page not included in the file, Plucker will show you the URL so you can look it up later.
Installing just the viewer is actually quite easy for end-users; you can download just the viewer from the Plucker website, and Plucker users can beam the program to other users of Palm-compatible PDAs. Generating Plucker files is pretty easy from the command line, but I do agree that currently grandma may have trouble generating documents on her own. It's also true that getting "new" versions of Plucker documents isn't automatic; you have to do something to get an update. The Plucker folks are actively working on solving these problems, e.g., creating GUI interfaces. Since Plucker is already a really nice viewer, and other work is already ongoing, I think that the Plucker developers will quickly succeed in making it easier for naive users to generate their own documents.
-
Content providers: generate Plucker format.Plucker is a very good solution to the problem. If you're a content provider and want to support Palms, just generate the Plucker format yourself. That way, users don't have to figure out how to generate the format; they just download and synchronize.
This is already happening. For example, the Linux Documentation Project (LDP) recently added support for Plucker; the LDP now automatically generates Plucker format for all HOWTO, mini-HOWTO, and FAQ documents. The LDP also automatically regenerates the files when the documents are updated. Pluckerbooks has over a thousand pregenerated books and they have links to other sources of Plucker documents.
In fact, I've recently added support for Plucker to my own website. My paper Why Open Source Software / Free Software? Look at the Numbers! also has a Plucker version available. I also generate a Plucker version of my book on writing secure programs. So I'm speaking from experience here.. Plucker works well for at least some content providers!
Downloading the tools and then generating the Plucker format is easy if you can use a command line interface. Plucker's format is essentially compressed HTML, so for most websites it's easy to support. Plucker is GPL'ed, so its components (the generator and reader) can't be "taken away"... and they are free for any use. This combination of free reader, free creator, and no risk (because it can't be taken away) makes Plucker much more appropriate for many content providers. The Plucker viewer itself is quite capable, for example, it supports larger fonts for headings, bold text, italics, hypertext links, images, horizontal rules, and tables (formatted as one cell per line). If you click on a hypertext link to a page not included in the file, Plucker will show you the URL so you can look it up later.
Installing just the viewer is actually quite easy for end-users; you can download just the viewer from the Plucker website, and Plucker users can beam the program to other users of Palm-compatible PDAs. Generating Plucker files is pretty easy from the command line, but I do agree that currently grandma may have trouble generating documents on her own. It's also true that getting "new" versions of Plucker documents isn't automatic; you have to do something to get an update. The Plucker folks are actively working on solving these problems, e.g., creating GUI interfaces. Since Plucker is already a really nice viewer, and other work is already ongoing, I think that the Plucker developers will quickly succeed in making it easier for naive users to generate their own documents.
-
Content providers: generate Plucker format.Plucker is a very good solution to the problem. If you're a content provider and want to support Palms, just generate the Plucker format yourself. That way, users don't have to figure out how to generate the format; they just download and synchronize.
This is already happening. For example, the Linux Documentation Project (LDP) recently added support for Plucker; the LDP now automatically generates Plucker format for all HOWTO, mini-HOWTO, and FAQ documents. The LDP also automatically regenerates the files when the documents are updated. Pluckerbooks has over a thousand pregenerated books and they have links to other sources of Plucker documents.
In fact, I've recently added support for Plucker to my own website. My paper Why Open Source Software / Free Software? Look at the Numbers! also has a Plucker version available. I also generate a Plucker version of my book on writing secure programs. So I'm speaking from experience here.. Plucker works well for at least some content providers!
Downloading the tools and then generating the Plucker format is easy if you can use a command line interface. Plucker's format is essentially compressed HTML, so for most websites it's easy to support. Plucker is GPL'ed, so its components (the generator and reader) can't be "taken away"... and they are free for any use. This combination of free reader, free creator, and no risk (because it can't be taken away) makes Plucker much more appropriate for many content providers. The Plucker viewer itself is quite capable, for example, it supports larger fonts for headings, bold text, italics, hypertext links, images, horizontal rules, and tables (formatted as one cell per line). If you click on a hypertext link to a page not included in the file, Plucker will show you the URL so you can look it up later.
Installing just the viewer is actually quite easy for end-users; you can download just the viewer from the Plucker website, and Plucker users can beam the program to other users of Palm-compatible PDAs. Generating Plucker files is pretty easy from the command line, but I do agree that currently grandma may have trouble generating documents on her own. It's also true that getting "new" versions of Plucker documents isn't automatic; you have to do something to get an update. The Plucker folks are actively working on solving these problems, e.g., creating GUI interfaces. Since Plucker is already a really nice viewer, and other work is already ongoing, I think that the Plucker developers will quickly succeed in making it easier for naive users to generate their own documents.
-
Re:Security is never free
DRM is inherently user-unfriendly, because it exists to prevent the user from doing some things.
You're right. And we have to remember that when I want to "pirate" a book for a large scale, I will always be able to copy it manually. It's much easier than with music or films, because everyone who can use a text editor, type writer or a pencil will always be able to make a copy-friendly version. And there's only need for one such version of every book. (It reminds me a story about a young pirate named Mozart.) To much work? I've already seen hundreds of such books in BBS's ten years ago. Copy-"protecting" books makes no sense. Are these fanatics planning to make the pencil illegal? Because that's the only way to have working digital "rights" management for books. (And by "working" I mean that only criminals will be able to copy, because they always will.)By the way, have you noticed the opposite meaning of words in such terms like copy-"protection" or digital "rights" management, etc.? Does it remind you something? Like the Ministry of Truth? Yes, I linked to Adobe eBook version of George Orwell's 1984, how ironic... "THIS TITLE IS NOT TEXT-TO-SPEECH COMPATIBLE"
To be more optimistic, I'm just reading "Secure Programming for Linux and Unix", a great book released under the GNU Free Documentation License. Fortunately, not everyone is a copy-"protection" freak yet.
-
Secure Programming for Linux and Unix HOWTO
My favorite:
-
Some more recommendationsI'm no security expert, I've only just recently started reading. And incidentally, a couple of days ago I've begun reading "Security Engineering". So far I share the reviewers very good impression.
I'd like to recommend some complementary books; each of these approach security from a different angle
- Secrets & Lies by Bruce Schneier. Deals with the "soft" issues. What are the threads to networked systems? Who are the attackers? One of the messages: Risks can't be avoided -- manage them.
- Building Secure Software by John Viega and Gary McGraw This one's closer to technological issues related to security. Risks of various base technologies (languages, middleware). Introductory details on buffer overflow attacks, random numbers, cryptography. Some organizational/dev process stuff.
- Secure Programming for Linux and Unix HOWTO
- by David A. Wheeler. Technical security down to the C-level. Programming techniques.
Michael
-
Re:Any books w/sample code?
I haven't read the book, but for something with a perhaps more practical approach, check out the Secure Programming for Linux and Unix HOWTO.
-
Re:Secure programming HOWTO for Linux and UNIXThanks for the plug! My book, Secure Programming for Linux and Unix HOWTO, is free, and it's open source/free software (GNU FDL).
I've also just posted my presentation on how to write secure programs; it's the presentation I gave at FOSDEM 2002 last week. Note that these presentations have different (overlapping) goals; Louis Bertrand's presentation is primarily about OpenBSD (e.g., how it's developed), while my presentation is primarily about how developers can develop secure programs. My presentation, like the book, is at http://www.dwheeler.com/secure-programs.
-
Re:The only remaining wish...
Try the Secure Programming for Linux and Unix HOWTO
It explains the basics of secure programming and common problems with a variety of programming languages including buffer overflow and many more tricky problems.
-
Secure programming HOWTO for Linux and UNIX
While we're on this topic, this Secure Programming HOWTO for Linux and UNIX might be of interest. It's a pretty comprehensive book. And best of all, it's free!
:-) -
Re:Well, m$ has to do something.
Wait, say that again, "third party implementations", meaning what exactly?
I don't think that I've seen a GPL'd java or a red-hat java.
This page lists several non-Sun Java implementations. Several of them are open source, GPL'ed and are in fact part of the standard Red Hat distributions.
As far as Sun not giving up control, well, didn't the Microsoft attempt to hijack Java prove that they had some justification in this?
-
Yes, someone has.
It's called More Than A Gigabuck
-
Re:Want a brand new car for free?
Thank you! I was hoping some kind soul would post the link.
I also see a link to this, which may be the article I remembered. Both are very interesting.
The practical upshot is that Linux (the system) represents more than a billion-dollar development effort.
-
Re:Want a brand new car for free?
It's here.
-
Re:Hhhmmm...
Look-at-the-numbers approach to "Why Open Source?"
http://www.dwheeler.com/oss_fs_why.html
You probably want the security section.
-
Re:Hhhmmm...
Look-at-the-numbers approach to "Why Open Source?"
http://www.dwheeler.com/oss_fs_why.html
You probably want the security section.
-
Re:The actual count: 149,367
I should have posted a link to the tool which can be found at: http://www.dwheeler.com/sloccount/.
This tool basically counts phsysical lines of code (non comments or whitespace) and produces cost and schedule estimates on this count using the standard COCOMO model. -
Re:But this is exactly the problem
If it's only worth 10k to me (and to all the other individuals/organizations), then the application will simply never get built in RMS' world.
Very interesting, and I'm not sure that I can refute it yet, or even that I want to. I don't hold the same views as RMS w.r.t. proprietary software. Still, I don't think it's fair to say that expensive software won't get built in RMS's world. Linux (or in RMS-speak GNU/Linux) got built despite the fact that it required >$1 billion in development costs. And cost Linus nearly nothing to get the whole thing started.
Now, if you personally want a custom OS, paying someone to tweak linux with your customizations is a *lot* more affordable than starting from scratch. You can even keep those changes to your self, and not give them to anyone else. The *only* thing you can't do is release the software in a proprietary format. And, surprise surprise, this is exactly what's being done, over and over again. Some suspect that this is a trend.
Also, we already have an example of a business model where legislated openness has created some monster organizations. The pharmaceutical industry, under the governance of the FDA, is required to publish their drugs before a very long and drawn out peer review. That doesn't keep them from pooling the resources necessary to develop hundreds of failed drugs for every 1 successful drug.
Of course, the pharmaceutical industry relies heavily on patents. I'm pretty sure that RMS doesn't like those either. If a purely RMS world includes prohibiting patents, the pharmaceutical industry would be in trouble in such a world. But if we limit the scope strictly to legislated openness, the pharmaceutical industry demonstrates that huge resources can be pooled even with legislated openness.
-
Tell all-Microsoft shops: save $, competitive bidsHere's a fun way to drive Microsoft crazy...
Tell Microsoft-only organizations to threaten Microsoft, saying "we'll switch to open source software (e.g., GNU/Linux) instead of Microsoft's software." Organizations that do so might be able to save a lot of money, even if they have no intention of actually making the switch.
Many of these Microsoft-only shops have been hit with the recent licensing changes that (for most) increase their costs, and believe that there's nothing they can do about it. It looks like Microsoft may be so concerned about losing business that they may grant all sorts of price concessions to keep business. Organizations should develop competitive bidding strategies (just like they do for many other purchases), looking at the costs and benefits of the services they're paying for.
Obviously, organizations are only going to save a lot of money if they're a credible threat, e.g., represent a significant account and have "done their homework" to show that they really could switch to open source software. Total cost of ownership (TCO) calculations and quantitative evidence help here. Many organizations will find employees who can really strengthen this analysis through personal experience (e.g., those who use such software at home). If Microsoft wants "exclusive use" clauses, make sure they're dearly won and for a limited time (so that the organization can save lots of money again in a few years). Even if the organization picks Microsoft anyway (just as they were going to do), open sourcers can find amusement in causing Microsoft's revenue stream to dwindle.
Of course, an organization always runs the danger of finding out that open source software is actually the best choice. In that case, they can find the delight of a surprise bargain they weren't expecting.