IIRC, compiling into the kernel or as a module isn't all that different w.r.t. licensing. They are (probably) allowed to do so by the GPL since they didn't mix their proprietary code with GPL code, you did, and since you don't distribute the final result, neither GPL nor anybody cares.
Totally freezing the API would be quite impractical, since lots of compatibility kludges (which means difficulty in maintaining the code) must result, and kernel developers strongly dislike such things. I just hope there'll be some way to make sure that when the API or semantics of an exported symbol changes, instructions are given about how to upgrade the code calling them. Maybe put the compatibility layer inside a comment.
Once I tried to install linux on a computer. I had no boot CDs handy, nor can I make any boot diskettes besides those coming with the new computer, so I tried to use LOADLIN to boot linux, after which I can do some manual installation. The problem is that I have to boot into some kind of DOS (including Windows 98's DOS mode) in order to use it. A diskette coming with the box contains FreeDOS, but it doesn't seem to support LOADLIN at all. In the end I found a Windows 98 boot diskette from somewhere and finished the job.
I use do-while statements very rarely (mostly only do-while(0) hacks in macros that is going to be used in an unparenthesized if statement). However, suggesting statement duplication like your example IMHO really makes code unmaintainable, and sometimes it is hard to factor out a function for a long stmt that have well-defined meanings.
The reason do-while() statements are rarely used by me is that a while(1) {... break... } statement is more straightforward for my brain, and often ehough there is stuff after the break statement. Actually, in this case I might as well use "goto" statement which can be as clean, but again goto's are easy to read but not straightforward to write.
PHP as in your example is "weakly typed". Python and Ruby are strongly typed but dynamically typed. "statically typed languages" refers to a language in which the type of everything is known prior to runtime, which is what you call "early-binding".
As for language design, I find Lisp and Scheme and OCaml miles better than Perl (too "dirty") or Ruby (the "yield" statement just feels ugly compared to real first-class functions --- can any Ruby-lovers enlighten me?). Java, C# and Python are somewhere in between in terms of elegantness (good overall, but with some rough edges).
However, when actually programming in those great functional languages, one often have a hard time deciding whether to use imperative or functional style. Programs in functional style is much prettier, but many data structures (arrays, hashtables) are inherently imperative, so in many places I just have to go the imperative way, which leads to either a mostly imperative program (might as well use Python or Java), or a mixture of functional and imperative stuff which is IMHO a bit slower and somewhat harder to maintain.
To this day the most complex thing I have ever written in Scheme/Lisp/OCaml is an Unlambda interpreter written in OCaml, which is about 20KB of source code. I just can't make more good use of the functional features in more general-purpose code.
I agree with your post in general, but I want to clear up some misunderstanding in your post.
Moreover, IME the VC++ optimiser is quite smart about intrinsics, and in lengthy calculations will often arrange for the right values to be coming to the head of the stack in the correct order as cheaply as possible, even if it means planning ahead a few instructions. If you look at the assembly language output from VC++ for a numerical computation, it tends to have a series of instructions to stack what it needs, with the occasional calculation opcode thrown in between them, and then a whole series of neatly co-ordinated calculation opcodes at the end.
Such code is usually very inefficient, because a series of calculation opcodes without stack operations in between probably has to be executed serially (most forms of x87 calculation opcodes store results in st(0) which is used by the next one, although other forms do exist). In contrast, stack operations (such as FXCH) are very cheap (especially since they are often executed by a different execution unit from the one doing the calculations), and proper use of them can often enable instruction-level parallelism, which may double or triple performance, since the latency of many instructions are two or three times higher than the throughput.
No decent compiler (which is almost everything we currently use) should generate such serial code when optimizing for Pentium or better, if they are allowed to do otherwise according to the standards. Probably there is something wrong within your source code. For example, IIRC the compiler is often not allowed to change the order of calculations in certain ways, even if they are mathematically correct, according to IEEE specs, so you may have to specify the order explicitly in the source code.
As other posts have explained, instructions like "FSIN" and "FCOS" are not exactly standards compliant, so if they aren't used, most probably the compiler are just not allowed to use them --- it's up to you to tweak some switches to tell it that such instructions are allowed.
BTW, gcc is stricter on this by default, so you can often hear people say "gcc-generated code is very slow" when the fault lies within their code / switches which prevents much optimization to be done.
Just look at the assembly. "fsin" and "fcos" (and especially "fsincos") are usually faster than software-based versions, but the latter must be used if you need strict standards compliance. If you don't need that, "-ffast-math" or "-mno-ieee-fp" is needed for gcc (but for MS and intel's compilers, you need to specify an option for strict compliance).
In a DCT algorithm I wrote using SSE intrinsics (mainly _mm_addps and _mm_mulps), I tried really hard to optimize the code for icc8 (which I used by default during the optimization), but the resulting code runs at only 2.4Gflops on a 2.4GHz pentium 4 (which is pretty low efficiency for 4:1 vector code). gcc 3.2 generates 10% faster code without much hand-tweaking.
The strange thing is that the resulting assembly code doesn't seem to be much different or particularly inefficient --- both gcc's and icc's code are a long stream of addps, mulps and movaps instructions, and since the evaluation order is made explicit in the C code, dependency should not be much of a problem. The working set fits comfortably inside the L2 cache, but L1 cache is expected to thrash a little. I can't see why this code can be that inefficient.
Similar things happened when I was hand-optimizing an IIR filter for icc8. The speed is quite decent (about 7Gflops in the inner loop), but after I changed "a=b+c+d" to "a=d+b+c" (since d is calculated first, I think this should at least not hurt), speed mysteriously halved. The assembly code doesn't look much different at a glance, either.
The last two cases look similar. I guess the P4 may have much degraded performance when the reorder buffer fills up or something. Anyway, this at least shows that even icc (as of now) does not give a reliable performance. If you want the absolute highest performance, make sure you always keep an eye on the benchmark results.
Of course, icc has automatic vectorization while gcc doesn't, and this is the most important reason why icc often beats gcc 2:1 in some floating-point benchmarks. However, in my case the most time-consuming loops are invariably too complicated for icc to parallize automatically (one for a custom DCT algorithm, one for a 4th order IIR filter), so I still have to vectorize that by hand.
But the parent poster's 50:1 ratio does seem strange.
One day I rewrote a gtk/glade-based application in C# (just to try out mono). Source code size went from 20kB in C to 18kB in C#. The size of the unwritten C++ version will probably lie somewhere in between (I don't use many things that can be simplified by using STL, while in C# a lot of "free"/"delete"s can be omitted). The problem in your logic is that, although "gtk_tree_path_append(path, 1)" is significantly longer than "path->append(1)", such library calls is usually only a not-so-big part of the UI code, which is in turn a not-so-big part of the whole program. Considering that most of the time is spent on designing and debugging, rather than typing, it is hard to see such simplification will help productivity much. A good library (such as Glib/STL/standard library of java and c#) or certain other language features (mostly just garbage collection) helps much more.
I'm not an experienced software engineer, but I think it would be good practice to first give the HTML pages a "development" mode where no input validation is done at that stage. Even better, all input should be entered from text boxes, so that you can give your program arbitrary input without handwriting URLs much. Front-end checking (such as client-side javascript) can be added at the same time or after the core engine is debugged, but it is still better to preserve the "development" mode in the code (for testers and developers only, of course).
Reiserfs3 does metadata journaling only, which only makes sure that the filesystem itself doesn't become b0rked (so that you don't have to run "fsck" during reboot) when the system goes down suddenly. It doesn't protect the data in your files. It is quite possible that the system wrote the information "the 3rd block of/var/spool/mail/xxx is the 1134th block on the filesystem" onto the disk, but the system crashed before the 1134th block is actually written to (the data blocks are not journaled; this is metadata journaling), so the 3rd block of that file becomes garbage.
Ext3 in its default mode also does metadata journaling only, but it always writes the data blocks first (at some performance hit), so such lossage won't occur.
In theory, you may lose data badly during a power failure on a non-journaling filesystem such as ext2, since the filesystem itself may be badly broken. However, this does not occur often in practice.
In short, reiser3 is probably not the data-eating monster in normal operating conditions, nor will the filesystem become corrupted in case of a power failure, but newly rewritten data can get lost (including the older versions) during a crash or power failure, so it is probably safer to use ext3 for now if you don't have a UPS.
Also, if your disk fails, all bets are off --- expect to lose some data, no matter how advanced your filesystem is (unless it is designed to operate on faulty hardware).
BTW, I dumped reiserfs on my disk (on my home machine) during a disk failure because it doesn't have the feature to mark blocks as "bad". Quite a few blocks on my disk mysterically went bad, and for some reason it was not corrected by the hard drive.
Actually the C used in GTK is quite OO-ish. It looks similar to C++ code with Qt, except with longer function names (which is just some typing, doesn't matter much). Such C code looks almost as good as well-structure C++ code, while improperly structured C++ code (where relationships between classes are not very natural) can look absolutely horrible. There are some things that I really dislike about C++. There is just too many ways to do a certain thing. A simple structure can be manipulated by functions (C style), or you can encapsulate them in objects (can enjoy some C++ benefits but writing getters and setters are really not interesting). A method can be public, protected or private, where the boundary is not always easy to draw, espectially between protected and private if I haven't imagined any use for inheriting from that class (which is true in most cases). The other problem is that it is hard to extend some one else's class if they happen to miss a small bit of functionality because of access control, while in C you do have some quick-and-dirty choices. Although you have the choice of whether or not to use certain functionality in C++ such as access control, templates and RTTI, the decision is hard to make. Also, it is messy to mix STL stuff and custom String and List classes.
Of course, all these problems can be addressed somewhat if you take enough time to architect the software carefully, it takes too much time for hobbyist-sized (several thousand lines) projects. Improperly architected C programs are not that far from good C programs (as long as good coding practices are used), but C++ programs will be very messy if some parts are put together ad hoc.
For object-oriented programming, I prefer simpler languages such as Smalltalk, Java and C#, if they suit the job.
Many GUI programs (in linux or otherwise) are buggy. They may crash if you use them in an unexpected way (and since you are just randomly clicking around, it is hard to generate a bugreport). Many of them also have annoyances like poor focusing (many applications are not very usable with keyboard only), inability to paste from a certain place to another certain place (copy-and-paste works in general), unnecessarily destroying the primary selection (use for middle-click pasting which is very useful against traditional X apps) without ME selecting anything, etc. There are just too many things to test, and it is cumbersome to test all of them manually before each release, while lacking a testsuite greatly lowers software quality (imagine how buggy gcc will be without a testsuite). Hopefully there will be some free tool that automate the process of "test case1: click file, click open, choose/home/xx/ss.xx, choose node33 in treeview, TAB", so that the GUI parts of GUI applications can finally be as well tested as traditional command-line applications.
For numeric C code, automatic vectorization will often double or quadruple (if you use SSE) performance with automatic (or manual) vectorization, as some other post has said. Other factors, such as inter-procedural optimization, gcc's lavish use of stacks and imperfect SSE register allocation, helps very little.
In one of my programs, icc7 actually produced slower code than gcc (at -march=pentium4, maximum optimization) because the most time-consuming loop was not automatically vectorized for some reason. The generated code for this loop (by both gcc and icc) are actually using x87 floating-point instructions (sse instructions are used in most other parts). gcc with -ffast-math generates reasonable code, while the icc-generated code have very long dependency chains, and is thus slower. The code in question is the sum of 9 products, so I think icc should be allowed to change the order of summation (anyway it defaults to non-strict floating-point), but it didn't do so automaticaly. Then I removed the dependency chain by hand by adding up partial results, and speed instantly doubled with icc (I didn't try gcc). Then I vectorized the code manually by using SSE intrinsics --- another 4x speedup (of course this would help gcc too, but I didn't try, either).
The moral of the story is that it is still unwise to trust the compiler too much to optimize your code. If most of the time is spent on very little code, some manual vectorization and formula rearrangement really pays off, whatever compiler is used.
RPMs are available now, with detailed instructions, on the project site. It is just four RPMs without too much dependencies if you don't want to run the Apache-based server (which is harder to install, but most people won't need that at first). It worked well for me.
In August I found a bug in GCC, reported it in detail, and it got fixed by some GCC geeks in a few days. Finding the bug is really not very easy, although it didn't require me to understand GCC sources.
It came up when I was trying User-Mode linux on 2.6.0-test3. The user-mode-linux process keeps crashing when an IPv4 packet is received. I gdb'd around a little (I didn't know much about linux's network stack, but it is not too hard to find the usual code paths), and found (after two hours) that the crash happens upon returning from ip_vs_in(). This function is usually not compiled in, especially on user-mode-linux systems, so maybe that's why other people haven't found that. Every subroutine in ip_vs_in() runs normally, so I traced instructions and found that the return address is wrong. It is probably stack corruption, I thought, so I put a watchpoint on the original return address on the stack upon entering ip_vs_in(), but it was never hit. Then it happens that the stack pointer is not the same when exiting from it! Looks like a GCC bug, and disassembly confirmed it (it was a large routine, but in this particular case only a small part is executed).
I don't know much about GCC, so I can't fix it, but making a testcase is relatively easy because it is a reproducible bug. I just sent a bug report to redhat's bugzilla (because it is the gcc in redhat 9), which contains the exact gcc versions, the options used, the preprocessed file, and the corresponding incorrect assembly output (with some explanation about where the error is). As for simplification, I think gcc developers can do it better than me (after all, the bug is obvious even in the unsimplified test case), so I didn't do it. The bug got fixed in a matter of days.
This has been explained in the some other post, I just summarize that in the subject. Note that this is just a convention, and certain 64-bit systems do adopt different conventions such as 64-bit int's.
I don't like music much, and can pretty comfortably do without any music at all. The only thing that matters for me is that whether I'll be able to buy a computer that is completely controlled by ME, not someone else. Except that, there is no reason for me to enter this discussion.
I have tried this several months ago. Very very few nodes can be contacted (though many have been tried) even after several hours of attempt to use, so I think they are probably blocked. Anyway, I think it is not really that hard to filter out Freenet traffic, because IIRC the header is unencrypted anyway.
If crt1.c (or other parts of the GCC-specific runtime library) WERE under GPL, it is possible that it may render every userland program compiled with GCC under the GPL (unless the code is too short to be copyrighted:). HOWEVER, most of these run-time support files are under looser licenses than GPL (for example, GPL with linking exception), so the former probably don't apply. If you have lawyers and want to be careful, just download GCC source and read the license of every file that has some remote probability to make your software a derivative work of it.
If a new API is introduced, it is usually superior to the old one, often in ALL cases, so every maintained driver in the tree will probably get shifted to the new API. It is hard to make sure the old API keeps working when few drivers (many closed-source) use them and other parts of the kernel undergo heavy changes, especially when locking is involved.
So, even if the API seems to be stable, many drivers WILL break if the kernel get some serious change. A lot of the bug reports about many libraries (such as Gtk) is about some application that used to work with 1.2.5 breaks with 1.2.7 for some hard-to-debug reason, sometimes because the application developer used the library in the unintended way. If such things happen in the kernel, it will be even harder to debug, and many systems will probably get as unstable as Windows because many drivers that don't work yet now breaks silently, rather than refusing to compile. Looking this way, the time and effort and code bloat in maintaining API compatibility seems to have gone to waste.
It filters out these stuff quite reliably (and with very low false-positive rates) after training it with 20 or so legit mails. For safety just mark them as spam and not really delete them. It won't reduce bandwidth waste or prevent filling up the ISP's mailbox though.
IIRC, compiling into the kernel or as a module isn't all that different w.r.t. licensing. They are (probably) allowed to do so by the GPL since they didn't mix their proprietary code with GPL code, you did, and since you don't distribute the final result, neither GPL nor anybody cares.
Totally freezing the API would be quite impractical, since lots of compatibility kludges (which means difficulty in maintaining the code) must result, and kernel developers strongly dislike such things. I just hope there'll be some way to make sure that when the API or semantics of an exported symbol changes, instructions are given about how to upgrade the code calling them. Maybe put the compatibility layer inside a comment.
Once I tried to install linux on a computer. I had no boot CDs handy, nor can I make any boot diskettes besides those coming with the new computer, so I tried to use LOADLIN to boot linux, after which I can do some manual installation. The problem is that I have to boot into some kind of DOS (including Windows 98's DOS mode) in order to use it. A diskette coming with the box contains FreeDOS, but it doesn't seem to support LOADLIN at all. In the end I found a Windows 98 boot diskette from somewhere and finished the job.
I can't post my screenshot because the background image is copyrighted by some big company, with no strings attached. Too bad :(
The reason do-while() statements are rarely used by me is that a while(1) { ... break ... } statement is more straightforward for my brain, and often ehough there is stuff after the break statement. Actually, in this case I might as well use "goto" statement which can be as clean, but again goto's are easy to read but not straightforward to write.
Hate to reply to my own post, but I found that Ruby does have Proc objects, &xxx parameters, etc. The syntax is still strange though.
PHP as in your example is "weakly typed". Python and Ruby are strongly typed but dynamically typed. "statically typed languages" refers to a language in which the type of everything is known prior to runtime, which is what you call "early-binding".
However, when actually programming in those great functional languages, one often have a hard time deciding whether to use imperative or functional style. Programs in functional style is much prettier, but many data structures (arrays, hashtables) are inherently imperative, so in many places I just have to go the imperative way, which leads to either a mostly imperative program (might as well use Python or Java), or a mixture of functional and imperative stuff which is IMHO a bit slower and somewhat harder to maintain.
To this day the most complex thing I have ever written in Scheme/Lisp/OCaml is an Unlambda interpreter written in OCaml, which is about 20KB of source code. I just can't make more good use of the functional features in more general-purpose code.
I agree with your post in general, but I want to clear up some misunderstanding in your post.
Such code is usually very inefficient, because a series of calculation opcodes without stack operations in between probably has to be executed serially (most forms of x87 calculation opcodes store results in st(0) which is used by the next one, although other forms do exist). In contrast, stack operations (such as FXCH) are very cheap (especially since they are often executed by a different execution unit from the one doing the calculations), and proper use of them can often enable instruction-level parallelism, which may double or triple performance, since the latency of many instructions are two or three times higher than the throughput.
No decent compiler (which is almost everything we currently use) should generate such serial code when optimizing for Pentium or better, if they are allowed to do otherwise according to the standards. Probably there is something wrong within your source code. For example, IIRC the compiler is often not allowed to change the order of calculations in certain ways, even if they are mathematically correct, according to IEEE specs, so you may have to specify the order explicitly in the source code.
As other posts have explained, instructions like "FSIN" and "FCOS" are not exactly standards compliant, so if they aren't used, most probably the compiler are just not allowed to use them --- it's up to you to tweak some switches to tell it that such instructions are allowed.
BTW, gcc is stricter on this by default, so you can often hear people say "gcc-generated code is very slow" when the fault lies within their code / switches which prevents much optimization to be done.
Just look at the assembly. "fsin" and "fcos" (and especially "fsincos") are usually faster than software-based versions, but the latter must be used if you need strict standards compliance. If you don't need that, "-ffast-math" or "-mno-ieee-fp" is needed for gcc (but for MS and intel's compilers, you need to specify an option for strict compliance).
The strange thing is that the resulting assembly code doesn't seem to be much different or particularly inefficient --- both gcc's and icc's code are a long stream of addps, mulps and movaps instructions, and since the evaluation order is made explicit in the C code, dependency should not be much of a problem. The working set fits comfortably inside the L2 cache, but L1 cache is expected to thrash a little. I can't see why this code can be that inefficient.
Similar things happened when I was hand-optimizing an IIR filter for icc8. The speed is quite decent (about 7Gflops in the inner loop), but after I changed "a=b+c+d" to "a=d+b+c" (since d is calculated first, I think this should at least not hurt), speed mysteriously halved. The assembly code doesn't look much different at a glance, either.
The last two cases look similar. I guess the P4 may have much degraded performance when the reorder buffer fills up or something. Anyway, this at least shows that even icc (as of now) does not give a reliable performance. If you want the absolute highest performance, make sure you always keep an eye on the benchmark results.
Of course, icc has automatic vectorization while gcc doesn't, and this is the most important reason why icc often beats gcc 2:1 in some floating-point benchmarks. However, in my case the most time-consuming loops are invariably too complicated for icc to parallize automatically (one for a custom DCT algorithm, one for a 4th order IIR filter), so I still have to vectorize that by hand.
But the parent poster's 50:1 ratio does seem strange.
One day I rewrote a gtk/glade-based application in C# (just to try out mono). Source code size went from 20kB in C to 18kB in C#. The size of the unwritten C++ version will probably lie somewhere in between (I don't use many things that can be simplified by using STL, while in C# a lot of "free"/"delete"s can be omitted). The problem in your logic is that, although "gtk_tree_path_append(path, 1)" is significantly longer than "path->append(1)", such library calls is usually only a not-so-big part of the UI code, which is in turn a not-so-big part of the whole program. Considering that most of the time is spent on designing and debugging, rather than typing, it is hard to see such simplification will help productivity much. A good library (such as Glib/STL/standard library of java and c#) or certain other language features (mostly just garbage collection) helps much more.
I'm not an experienced software engineer, but I think it would be good practice to first give the HTML pages a "development" mode where no input validation is done at that stage. Even better, all input should be entered from text boxes, so that you can give your program arbitrary input without handwriting URLs much. Front-end checking (such as client-side javascript) can be added at the same time or after the core engine is debugged, but it is still better to preserve the "development" mode in the code (for testers and developers only, of course).
Ext3 in its default mode also does metadata journaling only, but it always writes the data blocks first (at some performance hit), so such lossage won't occur.
In theory, you may lose data badly during a power failure on a non-journaling filesystem such as ext2, since the filesystem itself may be badly broken. However, this does not occur often in practice.
In short, reiser3 is probably not the data-eating monster in normal operating conditions, nor will the filesystem become corrupted in case of a power failure, but newly rewritten data can get lost (including the older versions) during a crash or power failure, so it is probably safer to use ext3 for now if you don't have a UPS. Also, if your disk fails, all bets are off --- expect to lose some data, no matter how advanced your filesystem is (unless it is designed to operate on faulty hardware).
BTW, I dumped reiserfs on my disk (on my home machine) during a disk failure because it doesn't have the feature to mark blocks as "bad". Quite a few blocks on my disk mysterically went bad, and for some reason it was not corrected by the hard drive.
Of course, all these problems can be addressed somewhat if you take enough time to architect the software carefully, it takes too much time for hobbyist-sized (several thousand lines) projects. Improperly architected C programs are not that far from good C programs (as long as good coding practices are used), but C++ programs will be very messy if some parts are put together ad hoc.
For object-oriented programming, I prefer simpler languages such as Smalltalk, Java and C#, if they suit the job.
Many GUI programs (in linux or otherwise) are buggy. They may crash if you use them in an unexpected way (and since you are just randomly clicking around, it is hard to generate a bugreport). Many of them also have annoyances like poor focusing (many applications are not very usable with keyboard only), inability to paste from a certain place to another certain place (copy-and-paste works in general), unnecessarily destroying the primary selection (use for middle-click pasting which is very useful against traditional X apps) without ME selecting anything, etc. There are just too many things to test, and it is cumbersome to test all of them manually before each release, while lacking a testsuite greatly lowers software quality (imagine how buggy gcc will be without a testsuite). Hopefully there will be some free tool that automate the process of "test case1: click file, click open, choose /home/xx/ss.xx, choose node33 in treeview, TAB", so that the GUI parts of GUI applications can finally be as well tested as traditional command-line applications.
In one of my programs, icc7 actually produced slower code than gcc (at -march=pentium4, maximum optimization) because the most time-consuming loop was not automatically vectorized for some reason. The generated code for this loop (by both gcc and icc) are actually using x87 floating-point instructions (sse instructions are used in most other parts). gcc with -ffast-math generates reasonable code, while the icc-generated code have very long dependency chains, and is thus slower. The code in question is the sum of 9 products, so I think icc should be allowed to change the order of summation (anyway it defaults to non-strict floating-point), but it didn't do so automaticaly. Then I removed the dependency chain by hand by adding up partial results, and speed instantly doubled with icc (I didn't try gcc). Then I vectorized the code manually by using SSE intrinsics --- another 4x speedup (of course this would help gcc too, but I didn't try, either).
The moral of the story is that it is still unwise to trust the compiler too much to optimize your code. If most of the time is spent on very little code, some manual vectorization and formula rearrangement really pays off, whatever compiler is used.
RPMs are available now, with detailed instructions, on the project site. It is just four RPMs without too much dependencies if you don't want to run the Apache-based server (which is harder to install, but most people won't need that at first). It worked well for me.
It came up when I was trying User-Mode linux on 2.6.0-test3. The user-mode-linux process keeps crashing when an IPv4 packet is received. I gdb'd around a little (I didn't know much about linux's network stack, but it is not too hard to find the usual code paths), and found (after two hours) that the crash happens upon returning from ip_vs_in(). This function is usually not compiled in, especially on user-mode-linux systems, so maybe that's why other people haven't found that. Every subroutine in ip_vs_in() runs normally, so I traced instructions and found that the return address is wrong. It is probably stack corruption, I thought, so I put a watchpoint on the original return address on the stack upon entering ip_vs_in(), but it was never hit. Then it happens that the stack pointer is not the same when exiting from it! Looks like a GCC bug, and disassembly confirmed it (it was a large routine, but in this particular case only a small part is executed).
I don't know much about GCC, so I can't fix it, but making a testcase is relatively easy because it is a reproducible bug. I just sent a bug report to redhat's bugzilla (because it is the gcc in redhat 9), which contains the exact gcc versions, the options used, the preprocessed file, and the corresponding incorrect assembly output (with some explanation about where the error is). As for simplification, I think gcc developers can do it better than me (after all, the bug is obvious even in the unsimplified test case), so I didn't do it. The bug got fixed in a matter of days.
This has been explained in the some other post, I just summarize that in the subject. Note that this is just a convention, and certain 64-bit systems do adopt different conventions such as 64-bit int's.
I don't like music much, and can pretty comfortably do without any music at all. The only thing that matters for me is that whether I'll be able to buy a computer that is completely controlled by ME, not someone else. Except that, there is no reason for me to enter this discussion.
I have tried this several months ago. Very very few nodes can be contacted (though many have been tried) even after several hours of attempt to use, so I think they are probably blocked. Anyway, I think it is not really that hard to filter out Freenet traffic, because IIRC the header is unencrypted anyway.
If crt1.c (or other parts of the GCC-specific runtime library) WERE under GPL, it is possible that it may render every userland program compiled with GCC under the GPL (unless the code is too short to be copyrighted :). HOWEVER, most of these run-time support files are under looser licenses than GPL (for example, GPL with linking exception), so the former probably don't apply. If you have lawyers and want to be careful, just download GCC source and read the license of every file that has some remote probability to make your software a derivative work of it.
So, even if the API seems to be stable, many drivers WILL break if the kernel get some serious change. A lot of the bug reports about many libraries (such as Gtk) is about some application that used to work with 1.2.5 breaks with 1.2.7 for some hard-to-debug reason, sometimes because the application developer used the library in the unintended way. If such things happen in the kernel, it will be even harder to debug, and many systems will probably get as unstable as Windows because many drivers that don't work yet now breaks silently, rather than refusing to compile. Looking this way, the time and effort and code bloat in maintaining API compatibility seems to have gone to waste.
It filters out these stuff quite reliably (and with very low false-positive rates) after training it with 20 or so legit mails. For safety just mark them as spam and not really delete them. It won't reduce bandwidth waste or prevent filling up the ISP's mailbox though.