I don't think that idle processes on the system are any worse than shared libraries loaded into memory (or swap) waiting for for someone to call them. The problem is more one of getting an easy and efficient interface between the different processes; a C function call is much more efficient than a context switch to a different process, and lets you pass complex data structures and pointers without having to define an encoding.
If everyone else had to transport spices in creaking ships, while you alone were equipped with some special transporting power and could move the spice about at no cost (perhaps using Carryalls), then yes you could engage in arbitrage. Otherwise, you are engaging in business and taking risks.
I don't think you've given any good examples of Miguel-wheel-reinventing. Arguably GTK when we already had Lesstif (and the whole Xt layers underneath it built on).
It's true that GNOME duplicated lots of work done by KDE, but remember that back in those days KDE depended on a non-free library. So GNOME was reinventing the wheel only in the same sense that GNU duplicated work already done on Unix. In other words from a purely technical point of view it was duplicated work, but there were other considerations than technical, just as when rms set out to write a clone of Unix.
Mozilla vs KHTML: yes, I guess these two could have worked together better. I think they both developed at roughly the same time. Netscape of course was around long before KHTML, but the Mozilla people rewrote the renderer from scratch. But anyway, Mozilla is not part of GNOME and neither is KHTML.
I think you have to recognize that 'the right way' is a rather subjective matter. The GNOME developers, just like the KDE developers or the GNUstep developers or the ROX developers, did set out to do it the right way first time. If you you don't think they succeeded, it's unfortunate they weren't able to meet your expectations; it's not an argument against trying.
An arbitrage profit is riskless, involving a positive cash inflow at one or more dates, and zero cash flows at all dates. In other words, arbitrage requiers no investment and no cash outlay. The arbitrageur generates only cash inflows at one or more dates.
However, opportunities for true arbitrage only rarely exist in well-functioning markets. Indeed one of the fundamental assumptions of pricing theory is the absence of arbitrage rule, which says that it should not be possible to make arbitrage profits.
What an 'arbitrage' trading desk does is really a kind of speculative trading, with some risk involved, although the risk is usually less than with other kinds of speculation.
Miguel was not arguing that shared libraries suck; rather that Unix sucks because it doesn't use them enough, and every app reinvents the wheel. Take some of the most often cited examples of popular free software or Unix applications: Mozilla, Emacs, TeX, StarOffice. Common code between them: pretty much zilch. libc and libX11, some image handling libraries like libpng, and that's about it.
Using pipes instead of shared libraries would not help with forwards or backwards compatibility. You still have to decide on what protocol to use over the pipe, what the API is, which messages return what results and in what format. At least with shared libraries you have header files which provide a basic sanity check at compile time.
FWIW this is pretty much what the GNOME people did with Bonobo - code APIs using CORBA so that you can communicate with them over Unix pipes or sockets or other stream mechanisms.
CPU is certainly a good metric to look at. But over the long term, CPUs speed up a lot faster than disks speed up, so using more CPU to save a little disk access is usually a good tradeoff.
Many systems, however, do not run many entirely CPU-bound processes. If the CPU would otherwise be idle, it doesn't matter (apart from power consumption in a laptop) whether the usage by the filesystem is 10% or 90%. Only if there are other things wanting to run does it make a difference.
An interesting benchmark would be to run something that has a mixture of CPU-bound and IO-bound work; perhaps 'find . -type f | xargs -P 4 lzop'. lzop is a fast compression program and -P 4 tells xargs to run up to four of them at once, so the CPU will be kept busy. Given a large number of small files to compress in this way, which filesystem is fastest?
Your post sounds a lot like Miguel's 'Unix sucks' talk where he explains that Unix does not (or did not) have much reusable code and components at a level finer than whole processes - apart from a few libraries like libc and libX11 which are mostly static. Microsoft Office, sometimes the canonical Slashdot example of an ugly monolithic application, is in fact built from many small components (though for legal reasons it is hard to reuse them). The GNOME project set out to change this, although nowadays the emphasis is more on the GUI than on the component architecture.
Part of the problem is that nobody can agree on anything. So a MIME parsing library has to be written one in Emacs Lisp for Emacs, once as a Perl module, once in Python, as a C library (probably several C libraries), in Java, etc etc. Nobody can agree to write a reusable component once to a common interface (the implementation could be in any language, as long as the interface is usable from others) and just wrap that. On the other hand some libraries, often those coming later in Unix history, do have a single shared implementation, eg libpng.
Importing spices from the Far East is not arbitrage, it's just trade. The price difference was largely due to transport costs, and you take a risk that something will go wrong during the journey. Real arbitrage is riskless and you don't have to _do_ anything beyond the buying and selling.
Buying spices, paying someone to transport them, getting insurance in case they are lost in transit, and selling them at the other end would be arbitrage because it would be riskless. But it might not be profitable.
Linux users say a four-day visit to India last November by Microsoft Chairman Bill Gates (news - web sites), who announced $400 million in local investments, drew attention to Linux.
So how do you deal with trolls and spammers who will vote up or vote down sites for partisan reasons? Or ignoring that, what about straightforward differences of opinion? (The world may be polarized 50/50 between those who think 'firebird' refers to a database and those who think it is a web browser - at least among the geekier-than-average WhittleBit users.)
Anonymous feedback won't scale well to the big bad Internet; some kind of login and network of trust is needed.
I thought the definition of species included something like: if two kinds of animal can and do interbreed (producing fertile offspring), then they are the same species.
This has problems - there might be cases where same-species is not transitive - but if it's possible to create hybrids between two 'different species', what does the word mean?
So basically, they are more interested in "ideological purity" than promoting realistic progress towards their goal.
Their goal, AFAIK, is to give computer users freedom by making sure that everyone can run free software - not just some free software, but entirely free software. The operating system by itself is not enough, but it seemed like a good place to start. Given that goal, they seem to promote realistic progrses towards it and to act practically in pursuit of it.
But it does show that RMS/FSF are worthless as a realistic leader of today's free software movement. The question is, who and which organizations are up to the task?
It depends on how you define 'movement' I suppose. The only alternative leader would probably be Linus (ESR being too much of a wackjob).
For a licence to be 'GPL compatible' means that you can take some code distributed under that licence and redistribute it under the GPL.
For example, you can take a device driver from a recent FreeBSD release, port it to Linux, and release a new Linux version including that driver. This is because the BSD licence is GPL-compatible.
On the other hand you cannot take Apple's code and mix bits of it with Linux, because the code is licensed to you under the APSL and the APSL does not allow redistribution under a different licence. Neither does the GPL. So you cannot make a single work combining code from both.
It would be fair to note also that the GPL is APSL-incompatible; the fault is not particularly with one licence or the other. But since the GPL is by far the dominant copyleft licence (and many other licences explicitly allow redistribution under GPL - eg Perl, Mozilla, LGPL), it's normal to say that the incompatibility is because of the other licences since the GPL got there first.
It isn't always clear from context; for example if you have a ten-gigabyte disk and one 'gigabyte' of main memory, how many memory images can you write? Even if the ambiguity is tolerable for people who've worked with computers for a long time, this doesn't mean it is a good system. And that's without even considering the mutant hybrid units as used in a '1.44 megabyte' floppy.
People may have no idea what 'kibi' is, but at least they _know_ that they do not know. That's probably better than not knowing that you don't know that 'kilo' is different to what you expect. Of course if you are talking approximate quantities you can use kilo, etc all the time: 'This computer has about a gigabyte of RAM'.
You sure? I only rarely need to use backreferences, lookahead, or other 'irregular' features. Nine times out of ten a regexp needs only ?, + and *, plus alternation and simple anchors like \b. As far as I know this can be straightforwardly translated to a true regular expression.
I think you can use GCJ to generate a statically linked and therefore self-contained executable. I agree that having to set CLASSPATH is a pretty limp reason to avoid Java (just distribute your app using something like the JPackage project so that dependencies and setup of necessary libraries can be handled automatically).
Using the JVM or JRE you are limited to platforms where it has been ported; as far as I know gcc runs on a wider range of platforms than that.
There are other advantages besides startup time. The page you link to is Slashdotted or otherwise down, but I assume 'the JVM' means Sun's proprietary JVM. GCJ, on the other hand, is free software and uses free class libraries.
With hindsight, the whole Java bytecode thing seems like a waste of time. Java source code is fairly simple to compile (at least naively), and the bytecode is easy to turn back into source so that it doesn't provide any better obfuscation than shrouded source code.
Sun could have provided a Java compiler and let applets be downloaded as source code (which is executed in a restricted sandbox, as now). This would have provided the same distributed-code capabilities with less complication, saved some dead trees for the needless bytecode standardization (just let it be implementation-dependent, bytecode would not be distributed anyway), and perhaps the Java 1.0 performance would not have sucked quite so much.
Yes, it is out of the question to overload the same prefix to mean both 1000 and 1024. A 2.4% error doesn't sound so bad, but once you get up to gigabytes or terabytes the gap between the two widens (there is nearly a 10% gap between terabyte and tebibyte).
Consider that even in the computing field there are many things measured in the decimal units rather than binary - for example Fast Ethernet is 100 megabits per second.
You're right about 'byte'; truly pedantic documents (like international standards) say 'octet'. On the whole, there's no good reason to keep quoting sizes in bytes; most computing devices do not have 8-bit registers or buses, and a single character does not necessarily fit in 8 bits (if it ever did; ASCII is 7-bit). It would make more sense just to use the bit as measure of information and give disk sizes in terabits, and so on.
The answer is to *make it cost* to send a message. For as long as sending thousands of messages costs next to nothing, spammers will continue to do it.
There are two reasonable ways to make it cost money to send messages. One is to charge a tiny postage fee (say one cent, or even 0.1 cents) for each message you read. The other is to demand 'payment' in terms of CPU cycles, by getting the spammer to compute something before constructing a valid message.
Jail for spammers is one way to 'make it cost', but it would be tricky to implement, since every country in the world would need to adopt antispam laws and enforce them. I think you underestimate just how difficult this would be to arrange, and how long it would take. But requring 'payment' before reading a message can be done end-to-end, without requiring intervention by ISP or government.
If you really do prefer to see links at the end of the paragraph or end of the document, it's possible to get the web browser to display them in that way (for example IE does this when printing a web page). The opposite transformation - picking up footnotes and inlining the links - is not nearly as easy. So write HTML that has the correct semantics for the element, with the content of the element being a short name description of the linked document, and don't mangle the HTML code just for a small presentational difference.
FWIW, I think that the problem of maliciously constructed input is a good enough reason not to use an unmodified qsort, at least not in cases where you sort large inputs provided by an untrusted user. There are sort functions which have bounded worst-case performance, even though they may be slower (say, twice as slow) than qsort in most cases.
With regular expressions you have to either write a complex regexp (a true regular expression can take only polynomial time in the string length, I think) or allow the user to provide both the regexp and the string to be matched, so that attack is less likely to take you by surprise.
I don't think that idle processes on the system are any worse than shared libraries loaded into memory (or swap) waiting for for someone to call them. The problem is more one of getting an easy and efficient interface between the different processes; a C function call is much more efficient than a context switch to a different process, and lets you pass complex data structures and pointers without having to define an encoding.
If everyone else had to transport spices in creaking ships, while you alone were equipped with some special transporting power and could move the spice about at no cost (perhaps using Carryalls), then yes you could engage in arbitrage. Otherwise, you are engaging in business and taking risks.
(Though GTK was developed before GNOME, as you know, and was already there when the GNOME desktop started.)
I don't think you've given any good examples of Miguel-wheel-reinventing. Arguably GTK when we already had Lesstif (and the whole Xt layers underneath it built on).
It's true that GNOME duplicated lots of work done by KDE, but remember that back in those days KDE depended on a non-free library. So GNOME was reinventing the wheel only in the same sense that GNU duplicated work already done on Unix. In other words from a purely technical point of view it was duplicated work, but there were other considerations than technical, just as when rms set out to write a clone of Unix.
Mozilla vs KHTML: yes, I guess these two could have worked together better. I think they both developed at roughly the same time. Netscape of course was around long before KHTML, but the Mozilla people rewrote the renderer from scratch. But anyway, Mozilla is not part of GNOME and neither is KHTML.
I think you have to recognize that 'the right way' is a rather subjective matter. The GNOME developers, just like the KDE developers or the GNUstep developers or the ROX developers, did set out to do it the right way first time. If you you don't think they succeeded, it's unfortunate they weren't able to meet your expectations; it's not an argument against trying.
However, opportunities for true arbitrage only rarely exist in well-functioning markets. Indeed one of the fundamental assumptions of pricing theory is the absence of arbitrage rule, which says that it should not be possible to make arbitrage profits.
What an 'arbitrage' trading desk does is really a kind of speculative trading, with some risk involved, although the risk is usually less than with other kinds of speculation.
Miguel was not arguing that shared libraries suck; rather that Unix sucks because it doesn't use them enough, and every app reinvents the wheel. Take some of the most often cited examples of popular free software or Unix applications: Mozilla, Emacs, TeX, StarOffice. Common code between them: pretty much zilch. libc and libX11, some image handling libraries like libpng, and that's about it.
Using pipes instead of shared libraries would not help with forwards or backwards compatibility. You still have to decide on what protocol to use over the pipe, what the API is, which messages return what results and in what format. At least with shared libraries you have header files which provide a basic sanity check at compile time.
FWIW this is pretty much what the GNOME people did with Bonobo - code APIs using CORBA so that you can communicate with them over Unix pipes or sockets or other stream mechanisms.
CPU is certainly a good metric to look at. But over the long term, CPUs speed up a lot faster than disks speed up, so using more CPU to save a little disk access is usually a good tradeoff.
Many systems, however, do not run many entirely CPU-bound processes. If the CPU would otherwise be idle, it doesn't matter (apart from power consumption in a laptop) whether the usage by the filesystem is 10% or 90%. Only if there are other things wanting to run does it make a difference.
An interesting benchmark would be to run something that has a mixture of CPU-bound and IO-bound work; perhaps 'find . -type f | xargs -P 4 lzop'. lzop is a fast compression program and -P 4 tells xargs to run up to four of them at once, so the CPU will be kept busy. Given a large number of small files to compress in this way, which filesystem is fastest?
Your post sounds a lot like Miguel's 'Unix sucks' talk where he explains that Unix does not (or did not) have much reusable code and components at a level finer than whole processes - apart from a few libraries like libc and libX11 which are mostly static. Microsoft Office, sometimes the canonical Slashdot example of an ugly monolithic application, is in fact built from many small components (though for legal reasons it is hard to reuse them). The GNOME project set out to change this, although nowadays the emphasis is more on the GUI than on the component architecture.
Part of the problem is that nobody can agree on anything. So a MIME parsing library has to be written one in Emacs Lisp for Emacs, once as a Perl module, once in Python, as a C library (probably several C libraries), in Java, etc etc. Nobody can agree to write a reusable component once to a common interface (the implementation could be in any language, as long as the interface is usable from others) and just wrap that. On the other hand some libraries, often those coming later in Unix history, do have a single shared implementation, eg libpng.
Importing spices from the Far East is not arbitrage, it's just trade. The price difference was largely due to transport costs, and you take a risk that something will go wrong during the journey. Real arbitrage is riskless and you don't have to _do_ anything beyond the buying and selling.
Buying spices, paying someone to transport them, getting insurance in case they are lost in transit, and selling them at the other end would be arbitrage because it would be riskless. But it might not be profitable.
How long before some student gets the webcam to display on his PDA, so he can read someone else's answers during a test?
D'oh!
Apparently ligers and tigons are almost always infertile. Are there examples of interspecies breeding producing fertile offspring.
Also, I think the definition of same species is 'can *and do* interbreed'. Man-made pairings don't count I think, only interbreeding in the wild.
So how do you deal with trolls and spammers who will vote up or vote down sites for partisan reasons? Or ignoring that, what about straightforward differences of opinion? (The world may be polarized 50/50 between those who think 'firebird' refers to a database and those who think it is a web browser - at least among the geekier-than-average WhittleBit users.)
Anonymous feedback won't scale well to the big bad Internet; some kind of login and network of trust is needed.
I thought the definition of species included something like: if two kinds of animal can and do interbreed (producing fertile offspring), then they are the same species.
This has problems - there might be cases where same-species is not transitive - but if it's possible to create hybrids between two 'different species', what does the word mean?
Their goal, AFAIK, is to give computer users freedom by making sure that everyone can run free software - not just some free software, but entirely free software. The operating system by itself is not enough, but it seemed like a good place to start. Given that goal, they seem to promote realistic progrses towards it and to act practically in pursuit of it.
It depends on how you define 'movement' I suppose. The only alternative leader would probably be Linus (ESR being too much of a wackjob).
For a licence to be 'GPL compatible' means that you can take some code distributed under that licence and redistribute it under the GPL.
For example, you can take a device driver from a recent FreeBSD release, port it to Linux, and release a new Linux version including that driver. This is because the BSD licence is GPL-compatible.
On the other hand you cannot take Apple's code and mix bits of it with Linux, because the code is licensed to you under the APSL and the APSL does not allow redistribution under a different licence. Neither does the GPL. So you cannot make a single work combining code from both.
It would be fair to note also that the GPL is APSL-incompatible; the fault is not particularly with one licence or the other. But since the GPL is by far the dominant copyleft licence (and many other licences explicitly allow redistribution under GPL - eg Perl, Mozilla, LGPL), it's normal to say that the incompatibility is because of the other licences since the GPL got there first.
It isn't always clear from context; for example if you have a ten-gigabyte disk and one 'gigabyte' of main memory, how many memory images can you write? Even if the ambiguity is tolerable for people who've worked with computers for a long time, this doesn't mean it is a good system. And that's without even considering the mutant hybrid units as used in a '1.44 megabyte' floppy.
People may have no idea what 'kibi' is, but at least they _know_ that they do not know. That's probably better than not knowing that you don't know that 'kilo' is different to what you expect. Of course if you are talking approximate quantities you can use kilo, etc all the time: 'This computer has about a gigabyte of RAM'.
On byte-addressing: fair point.
You sure? I only rarely need to use backreferences, lookahead, or other 'irregular' features. Nine times out of ten a regexp needs only ?, + and *, plus alternation and simple anchors like \b. As far as I know this can be straightforwardly translated to a true regular expression.
I think you can use GCJ to generate a statically linked and therefore self-contained executable. I agree that having to set CLASSPATH is a pretty limp reason to avoid Java (just distribute your app using something like the JPackage project so that dependencies and setup of necessary libraries can be handled automatically).
Using the JVM or JRE you are limited to platforms where it has been ported; as far as I know gcc runs on a wider range of platforms than that.
There are other advantages besides startup time. The page you link to is Slashdotted or otherwise down, but I assume 'the JVM' means Sun's proprietary JVM. GCJ, on the other hand, is free software and uses free class libraries.
With hindsight, the whole Java bytecode thing seems like a waste of time. Java source code is fairly simple to compile (at least naively), and the bytecode is easy to turn back into source so that it doesn't provide any better obfuscation than shrouded source code.
Sun could have provided a Java compiler and let applets be downloaded as source code (which is executed in a restricted sandbox, as now). This would have provided the same distributed-code capabilities with less complication, saved some dead trees for the needless bytecode standardization (just let it be implementation-dependent, bytecode would not be distributed anyway), and perhaps the Java 1.0 performance would not have sucked quite so much.
Yes, it is out of the question to overload the same prefix to mean both 1000 and 1024. A 2.4% error doesn't sound so bad, but once you get up to gigabytes or terabytes the gap between the two widens (there is nearly a 10% gap between terabyte and tebibyte).
Consider that even in the computing field there are many things measured in the decimal units rather than binary - for example Fast Ethernet is 100 megabits per second.
You're right about 'byte'; truly pedantic documents (like international standards) say 'octet'. On the whole, there's no good reason to keep quoting sizes in bytes; most computing devices do not have 8-bit registers or buses, and a single character does not necessarily fit in 8 bits (if it ever did; ASCII is 7-bit). It would make more sense just to use the bit as measure of information and give disk sizes in terabits, and so on.
The answer is to *make it cost* to send a message. For as long as sending thousands of messages costs next to nothing, spammers will continue to do it.
There are two reasonable ways to make it cost money to send messages. One is to charge a tiny postage fee (say one cent, or even 0.1 cents) for each message you read. The other is to demand 'payment' in terms of CPU cycles, by getting the spammer to compute something before constructing a valid message.
Jail for spammers is one way to 'make it cost', but it would be tricky to implement, since every country in the world would need to adopt antispam laws and enforce them. I think you underestimate just how difficult this would be to arrange, and how long it would take. But requring 'payment' before reading a message can be done end-to-end, without requiring intervention by ISP or government.
If you really do prefer to see links at the end of the paragraph or end of the document, it's possible to get the web browser to display them in that way (for example IE does this when printing a web page). The opposite transformation - picking up footnotes and inlining the links - is not nearly as easy. So write HTML that has the correct semantics for the element, with the content of the element being a short name description of the linked document, and don't mangle the HTML code just for a small presentational difference.
FWIW, I think that the problem of maliciously constructed input is a good enough reason not to use an unmodified qsort, at least not in cases where you sort large inputs provided by an untrusted user. There are sort functions which have bounded worst-case performance, even though they may be slower (say, twice as slow) than qsort in most cases.
With regular expressions you have to either write a complex regexp (a true regular expression can take only polynomial time in the string length, I think) or allow the user to provide both the regexp and the string to be matched, so that attack is less likely to take you by surprise.