"Identify the platforms that your application will have to run on and code to those platforms."
Nope, that's not what I meant.
I meant that you should put portability aside from the time your fingers hit the keys to the time that you have a working first version. Of course, if you're just gleefully hacking away without any thought of correctness and maintainability, then ignore this advice as it won't help you. If you are attempting to focus on the above items, however, you should not be thinking about portability or optimization. Those can be taken care of later, and the engineering that it will take to make your application portable and optimize it will benefit a well-structured, maintainable program more than it would have benefitted your first version anyway.
To all of the posters who said that making your code portable shook out bugs: you're correct. I've certainly discovered my share of bugs in my own and other people's code by porting (or just making the code portable, which doesn't mean you port it). Consider, however: how much more benefit do you get by isolating the portability work, doing it with a working system, and focusing on it and following up on everything that falls out of that work because it IS your focus at that stage?
This is not a matter of suggesting that portability isn't important. If anything, I'm suggesting that it's MORE important than most people give it credit for. You just don't want to be in the position of worrying about portability in with the paramount priorities of working code and maintainability.
Except the 90 days was only bassed on cracking encryption to some extent. The rest of the argument centered around understanding the data once you had it. If I hand you a hard disk full of files, you have to figure out what software knows how to read it, extract the information from it, sift through potentially gigabytes of data to find what you need, corrolate that against information held by other agencies around the world, etc.
It's a hard process, and while I don't support the 90 days argument (I always favor liberty over making law enforcement easier, sorry), the argument is much more sound than just "we need time to decrypt".
"It probably won't bite you in the butt when your IT department shows you the new production server you're expected to start running on next week that's not opcode-compatible with your development system."
Been there, done that. It's a major pain, but consider the trade-offs. You can be prepared for that eventuallity ahead of time, or you can narrow your focus on correctness and maintainability. Every bit of developer attenetion that you can focus on those two things buys you a little bit less pain when curve-balls are sent your way. Do it right, and it's not such a big deal that someone asked you to port to funky platform Z because your code is solid, and easily re-enginered.
Of course, you don't go long without worrying about portability. It should be one of the earliest, and best reasons to re-engineer your code, but it should be a re-engineering, not an effort you undertake as you initially design and build.
Sure. The only universal rule is that every rule has exceptions. Of course, if you are targetting very specific, highly constrained environments, you program with them in mind. The point is that you don't do the first pass with 10 other platforms in mind, but you DO set aside time for that work in a second pass, once your code is working correctly and maintainable.
That's right, you heard me. Don't write portable code.
Use portable libraries and languages; re-factor your working code to be portable; make high-level choices that support portability (e.g. don't lock yourself in to proprietary solutions), but don't write portable code.
Why? Because premature portability, like premature optimization is a red herring that steals your attention from the only two things that will ever matter: correctness and maintainability. Write correct code. Write verifiably correct code. Write maintainable code. Do these things and you are done. Then, port it to another platform or ten and optimize the hell out of it. Don't do these things up-front, as they buy you nothing on the first pass, and doing them later will give you the chance to re-consider the structure of your system which you should do at least twice before your first release anyway.
That said, do not snub portability unduely. If you have the choice of trivially supporting or not supporting portability-enhacing features (e.g. in your choice of a configure/build system), there's no reason not to be portable. Just don't let it set priorities for your project from day one.
I notice that MediaWiki is NOT on this list. This corresponds to my experience. I had some older weblog software exploited, and had to mop up after it, but my MediaWiki installation was fine.
Of course, MediaWiki is the pet target of some zombie-based spamming attacks right now, but that's not MW's fault, and I can clean up after that ok for now. If it gets worse, I'll have to start using some kind of visual authentication scheme.
Well, to be fair, 87 isn't that much. I work for a small company that has a proprietary software program written by 100 people, so no, that's not all that big.
However, the reasons that the MS troll is wrong are:
1. You can audit the code base 2. You can modify it as an interim fix if you need to 3. You can contribute bug-fixes to the vendor which makes a fix vastly more likely 4. You can hire a developer who works on the upstream project to increase control over priorities. 5. Debugging system problems is vastly easier with source.
And there you have it. Those are the benefits to a corporate entity which cannot afford to have the bulk of their software unsupported (because, frankly, building a complete support organization rarely scales well; you want your support organization to have significant help and after a certain size, what Red Hat charges for support is fairly cheap).
Of course, Apple, Novell, Mandriva, etc are all open source software vendors (to varying degrees) that offer the same advantages (to varying degrees).
All of the stego I've ever seen is subject to simple analysis. Mind you, this only tells you that "something's up", but it's trivial to do.
Just as an experiment, try using garden variety stego on a JPEG image by inserting information into the low-bits (obviously AFTER you throw away chroma bits and perform the JPEG transofrms on the sub-block). The last step of making a JPEG is to huffman code the resulting data. You'll find that it now compresses worse than it used to. Why? Because the point of those two major transforms that JPEG goes through is to make the data more compressable. You're adding in a new source of signal (in this case, something that's very noisy, since it's encrpted) after that, and the odds of that making the result even equally compressable are astronomical.
Stego is adding signal. Even when that signal is encrypted, and thus very line-noise-like, it's easy to detect mathematically that it's been added.
You could add it to noise to begin with, but then you find a guy with huge files of white noise on his disk, and you just assume that they're encrypted anyway.
"why the hell to people get some riled up by the obsoleting GNOME and KDE statement, have people completely lost their sense of humour?"
If Gnome were to release 3.0 and declare that, "GORM and KDE are now obsolete," I don't think the reaction from the GORM and KDE camps would be any different (permute the parties to taste).
It was a cheap marketting stunt, divisive, arrogant, and exactly what the proprietary software world expects of Open Source.
Well, someone was working on it in 2001, but thankfully he never finished. As others have pointed out, it's a horrible idea. Of course, Perl's current compiler/interpreter/runtime would need massive work—possibly a full re-implementation—for such a project, but none of that has anything to do with the language.
Let's be clear, here. When someone says, "you can't write foo in language x," they are full of shit, and they can take the matter up with Turing.
If they say, "language x is more suited to problems of the same class as foo than many/most/all other choices," then they might have a point, depending on the specifics.
The first claim is what was made by the original poster, who is... in case you lost me on this point... full of shit.
There are a lot of estimates out there, but generally they hover around 10% of Wikipedia content being both encyclopedic enough, complete enough, and well written enough for such an effort. That may sound like a low number, but keep in mind that there are far more "stubs" for things like backwater towns in the U.S. than there are for interesting topics like particle physics.
So, a printed version would have something like 100,000 or fewer articles, not the full content of the site.
No matter what, it would be a monumental effort. Simply proof-reading it would be every bit as large an effort as proof-reading any other encyclopedia. It's a hell of a good head-start on producing the content, though.
This already exists. Many organizations fork Wikipedia content for various reasons. Some just put up raw Wikipedia database content. Some prune for various flavors of "correctness". Some pull out everything but some specific areas that interest them.
You are, of course, free to create your own "Wikipedia stable".
However, the value proposition of Wikipedia is this: with thousands of people editing it constantly, you get more raw information on which to base your research.
Having said that, I've rarely discovered a page on Wikipedia which wasn't at least a very decent start for investigating the topic at hand, and often I find it to be the most valuable single reference online.
"Seems that they would behave more like original Suse did"
That's the context you chopped off. The GP realized that Yast was NOW open source, and was refering to the original release (keep in mind that SuSE kept it proprietary for years).
"Free speech allows the use of the seal -- any seal -- in the production of a satire."
Find me precident. It's a federal crime to use the seal without explicit authorization or to sell an existing copy of the seal, and there are a great many exceptions to the free speach clause when it comes to the use of Presidential authority and tools of office.
To quote Wikipedia:
Display of any likeness of the US Presidential Seal is restricted by US Federal law under 18 USC 713; however, use in encyclopedias (including Wikipedia) "incident to a description or history of seals, coats of arms, heraldry, or the Presidency or Vice Presidency" is allowed under Executive Order 11649. - http://en.wikipedia.org/wiki/Image:USPresidentialS eal.jpg
"The US government is publicly owned by its citizens."
I recall "of", "for" and "by". I do not recall any reference to "a government belonging to the people."
It's a fine point, but an important one, and the very reason that you're not allowed to take your share of the country with you if you decide to leave, but you are allowed to retain your property.
"Therefore, we own the presidential seal."
So very, very no.
If that were true, then every novelty store in the world would sell U.S. Presidential Seals, and it would be a worthless tool. Seals of state have been strongly protected in every government that I'm aware of, stretching back to the dawn of recorded history.
Because you can't authorize diplomatic action with a photo of your CEO? However, that's exactly what the seal of the President of the United States can be used for (and on a lesser scale, you can imagine that the signature of your CEO is used in the same way).
Honestly, I'm not quite sure why this is such a contentious topic. It's not as if they said that the Onion couldn't parody the seal, or applied a new rule that someone just made up. They're applying the same old rule that everyone has been living under for quite some time, and all the Onion has to do is modify the image so that it's clearly not the real seal.
Let's say that The Onion put up a story which featured your company's CEO's signature.
That's a trademark issue
No, I'm sorry it's not. The damage that that would cause has nothing to do with trademarks. You simply can't go around splashing people's signatures in public places any more than you can publish their social security numbers.
The seal is the signature of the President (whoever that might be). It is arguably one of the most powerful seals in the world; it can literally move armies; any plane with that seal is given special dispensation at most major airports in the world regardless of who is one it; and it is a federal crime to sell goods emblazoned with it. No, that's not trademark law.
It's not a matter of confusion, but of the nature of the seal. This is not a trademark.
This might be hard for most people to understand these days (since we don't use seals the way we used to), but let me use an analogy. Let's say that The Oninon put up a story which featured your company's CEO's signature. I'm sure that within a short span of minutes, they'd get some pretty irate calls from your executive management team. Same exact deal here. The President's signature is actually not terribly potent, as he is only the temporary holder of the office. What's important is the seal which represents the office, regardless of who holds it. It's more than a flag or a signature or a logo. It's represents the authorization of the President of the United States. This is why you cannot sell any item that contains the seal (for example, someone was sharing cigars with the seal at the office the other day, since he didn't smoke and couldn't do anything else with them).
I'm no banner-waver for this administration, but in this case, I would hope that any executive administration would have come down swiftly on such use of the seal.
It's trivial for The Onion to make a parody of the seal, and they know better. This smells like a grab for headlines to me.
"You don't read one sentence out of the middle then skip to a random one three lines down, then read every fourth word, then go to the top and read only the capitalized words."
I see you're not dyslexic... yeah, I really do. That's what you're not getting. I read in random bursts of non-linear text. If I didn't, I'd read a lot faster, but I'm not sure my comprehension speed would be as good. As it stands, I can "read" a document several times faster than most people that I know because I don't actually read it, I just extract the information I need from it.
"There's no mystical transcendent mode of "uber-literacy" that allows one to absorb information better than the linear, serial way around which our human languages are designed, and for which we have trained ourselves to process since birth."
Excuse me, but this is simply not my experience. Let's just take Wikipedia as an example. I visit hypertext looking for information on what "linking" is all about. I find a link early-on to hyperlink. This gets me to a page that you seem to want to peruse linearly, but I rarely do this. Instead, I glance at the first sentence, and then skip down to the TOC, reviewing it for keywords that help me get my berrings. Then I skip to a section that seems to discuss the specifics that I'm interested in.
I then tend to cast out through the category link to determine what other topics are related.
In many cases, I'm also scanning pictures first, as they can often help to explain what I want to know faster than the text.
Linear? Good lord, if I had to learn from the Web linearly, I'd still be trying to figure out what a LOL is.
I use a MySQL database to store 2+GB (~4million records) of new data per day, on which I run various reports. I had some significant problems early on using a very old copy of RH7.3 (which I think had a flakey driver for the RAID array I'm using, so it's not very fair of me to blame MySQL for that), but having switched to a modern release of Linux, it has been smooth sailing. I use MERGE tables to split up data into 1-week chunks, and then query them as one. It works amazingly well.
I'm really drooling over 5.0, but don't want to upgrade until it's out of beta (MySQL's betas are annoyingly long, but given how stable their releases have been, I guess I can't complain).
The only complaint that I have with MySQL at this point, that's not addressed by 5.0 (that I know of) is the limitations on the optimizer. If they improved their optimizer and took slightly better advantage of RAM than they do now, I'd never use anything else.
"In summary, auto-learn re-evaluates the message using only the static rules - not the bayes rules. Then, if the static rules give an extreme score that differs from the bayes score, and a couple of extra ad hoc conditions hold (number of "hits" exceeds some threshold) the bayes filter is trained."
Hrm... well, no.
First off "number of hits" is not an "extra ad hoc condition". Number of "hits" is exactly "score". There's no difference, just two pieces of terminology for the same thing. "Level" is another thing, but I won't go into that, as it's only there for the benefit of programs like procmail, and is not used internally.
Now, on to score with and without Bayes. I understand your initial concern, but I ask you to re-visit it. There has been substantial research into Bayes auto-learning under various systems, and what is show time and time again is that a set of well-balanced static rules (such as a set of tropisms or, in the case of SA, the static rule base) is far superior to any feedback-loop. This is why Bayes is discounted when computing auto-learning.
What you're doing is looking at outliers and saying, "see, this 'spam' was trained on as 'ham', and that means SA is broken." In fact, such errors will exist in both directions, but as long as the vast majority of spam trains as spam and the vast majority of ham trains as ham, the Bayes tokens will be correctly scored.
All that said, you seem uncomfortable with static rules of any kind, so if you don't buy into what I've said above, then I suggest that you stop using SA. Static rules are a giant advantage, but if you are going to defeat most of their value, then you might as well not suffer their overhead.
It seems as if other posters are correct. This is just a ceramic with the usual sorts of ceramic properties, just much harder than most. It's not a useful builing material.
"Identify the platforms that your application will have to run on and code to those platforms."
Nope, that's not what I meant.
I meant that you should put portability aside from the time your fingers hit the keys to the time that you have a working first version. Of course, if you're just gleefully hacking away without any thought of correctness and maintainability, then ignore this advice as it won't help you. If you are attempting to focus on the above items, however, you should not be thinking about portability or optimization. Those can be taken care of later, and the engineering that it will take to make your application portable and optimize it will benefit a well-structured, maintainable program more than it would have benefitted your first version anyway.
To all of the posters who said that making your code portable shook out bugs: you're correct. I've certainly discovered my share of bugs in my own and other people's code by porting (or just making the code portable, which doesn't mean you port it). Consider, however: how much more benefit do you get by isolating the portability work, doing it with a working system, and focusing on it and following up on everything that falls out of that work because it IS your focus at that stage?
This is not a matter of suggesting that portability isn't important. If anything, I'm suggesting that it's MORE important than most people give it credit for. You just don't want to be in the position of worrying about portability in with the paramount priorities of working code and maintainability.
Note that, "don't write portable code," and, "don't consider portability," are different statements, and I only made one of them.
I would also say, "don't write optimized code," not, "don't consider optimization."
Different.
Except the 90 days was only bassed on cracking encryption to some extent. The rest of the argument centered around understanding the data once you had it. If I hand you a hard disk full of files, you have to figure out what software knows how to read it, extract the information from it, sift through potentially gigabytes of data to find what you need, corrolate that against information held by other agencies around the world, etc.
It's a hard process, and while I don't support the 90 days argument (I always favor liberty over making law enforcement easier, sorry), the argument is much more sound than just "we need time to decrypt".
Slashdot poster: Greetings gentlemen, you already know my Execuscripts. Script Alpha, programmed to like things that are seen before.
Alphabot: Information wants to be free!
Slashdot poster: Script Beta, programmed to roll dice to determine preferences.
[Betascript rolls two 20-sided dice.]
Betascript: Imagine a beowulf cluster of petrified torrents!
Slashdot poster: And Script Gamma, programmed to underestimate anyone with a non-technical job.
Gammascript: It's cool, but is it enough to get the network executives to stop suing children for the FBI long enough to give me free stuff?
"It probably won't bite you in the butt when your IT department shows you the new production server you're expected to start running on next week that's not opcode-compatible with your development system."
Been there, done that. It's a major pain, but consider the trade-offs. You can be prepared for that eventuallity ahead of time, or you can narrow your focus on correctness and maintainability. Every bit of developer attenetion that you can focus on those two things buys you a little bit less pain when curve-balls are sent your way. Do it right, and it's not such a big deal that someone asked you to port to funky platform Z because your code is solid, and easily re-enginered.
Of course, you don't go long without worrying about portability. It should be one of the earliest, and best reasons to re-engineer your code, but it should be a re-engineering, not an effort you undertake as you initially design and build.
Sure. The only universal rule is that every rule has exceptions. Of course, if you are targetting very specific, highly constrained environments, you program with them in mind. The point is that you don't do the first pass with 10 other platforms in mind, but you DO set aside time for that work in a second pass, once your code is working correctly and maintainable.
That's right, you heard me. Don't write portable code.
Use portable libraries and languages; re-factor your working code to be portable; make high-level choices that support portability (e.g. don't lock yourself in to proprietary solutions), but don't write portable code.
Why? Because premature portability, like premature optimization is a red herring that steals your attention from the only two things that will ever matter: correctness and maintainability. Write correct code. Write verifiably correct code. Write maintainable code. Do these things and you are done. Then, port it to another platform or ten and optimize the hell out of it. Don't do these things up-front, as they buy you nothing on the first pass, and doing them later will give you the chance to re-consider the structure of your system which you should do at least twice before your first release anyway.
That said, do not snub portability unduely. If you have the choice of trivially supporting or not supporting portability-enhacing features (e.g. in your choice of a configure/build system), there's no reason not to be portable. Just don't let it set priorities for your project from day one.
I notice that MediaWiki is NOT on this list. This corresponds to my experience. I had some older weblog software exploited, and had to mop up after it, but my MediaWiki installation was fine.
Of course, MediaWiki is the pet target of some zombie-based spamming attacks right now, but that's not MW's fault, and I can clean up after that ok for now. If it gets worse, I'll have to start using some kind of visual authentication scheme.
Well, to be fair, 87 isn't that much. I work for a small company that has a proprietary software program written by 100 people, so no, that's not all that big.
However, the reasons that the MS troll is wrong are:
1. You can audit the code base
2. You can modify it as an interim fix if you need to
3. You can contribute bug-fixes to the vendor which makes a fix vastly more likely
4. You can hire a developer who works on the upstream project to increase control over priorities.
5. Debugging system problems is vastly easier with source.
And there you have it. Those are the benefits to a corporate entity which cannot afford to have the bulk of their software unsupported (because, frankly, building a complete support organization rarely scales well; you want your support organization to have significant help and after a certain size, what Red Hat charges for support is fairly cheap).
Of course, Apple, Novell, Mandriva, etc are all open source software vendors (to varying degrees) that offer the same advantages (to varying degrees).
Secure stegangraphy is truly undetectable.
All of the stego I've ever seen is subject to simple analysis. Mind you, this only tells you that "something's up", but it's trivial to do.
Just as an experiment, try using garden variety stego on a JPEG image by inserting information into the low-bits (obviously AFTER you throw away chroma bits and perform the JPEG transofrms on the sub-block). The last step of making a JPEG is to huffman code the resulting data. You'll find that it now compresses worse than it used to. Why? Because the point of those two major transforms that JPEG goes through is to make the data more compressable. You're adding in a new source of signal (in this case, something that's very noisy, since it's encrpted) after that, and the odds of that making the result even equally compressable are astronomical.
Stego is adding signal. Even when that signal is encrypted, and thus very line-noise-like, it's easy to detect mathematically that it's been added.
You could add it to noise to begin with, but then you find a guy with huge files of white noise on his disk, and you just assume that they're encrypted anyway.
EXTRACTING stego is hard, detecting it is not.
"why the hell to people get some riled up by the obsoleting GNOME and KDE statement, have people completely lost their sense of humour?"
If Gnome were to release 3.0 and declare that, "GORM and KDE are now obsolete," I don't think the reaction from the GORM and KDE camps would be any different (permute the parties to taste).
It was a cheap marketting stunt, divisive, arrogant, and exactly what the proprietary software world expects of Open Source.
Well, someone was working on it in 2001, but thankfully he never finished. As others have pointed out, it's a horrible idea. Of course, Perl's current compiler/interpreter/runtime would need massive work—possibly a full re-implementation—for such a project, but none of that has anything to do with the language.
Let's be clear, here. When someone says, "you can't write foo in language x," they are full of shit, and they can take the matter up with Turing.
If they say, "language x is more suited to problems of the same class as foo than many/most/all other choices," then they might have a point, depending on the specifics.
The first claim is what was made by the original poster, who is... in case you lost me on this point... full of shit.
There are a lot of estimates out there, but generally they hover around 10% of Wikipedia content being both encyclopedic enough, complete enough, and well written enough for such an effort. That may sound like a low number, but keep in mind that there are far more "stubs" for things like backwater towns in the U.S. than there are for interesting topics like particle physics.
So, a printed version would have something like 100,000 or fewer articles, not the full content of the site.
No matter what, it would be a monumental effort. Simply proof-reading it would be every bit as large an effort as proof-reading any other encyclopedia. It's a hell of a good head-start on producing the content, though.
This already exists. Many organizations fork Wikipedia content for various reasons. Some just put up raw Wikipedia database content. Some prune for various flavors of "correctness". Some pull out everything but some specific areas that interest them.
You are, of course, free to create your own "Wikipedia stable".
However, the value proposition of Wikipedia is this: with thousands of people editing it constantly, you get more raw information on which to base your research.
Having said that, I've rarely discovered a page on Wikipedia which wasn't at least a very decent start for investigating the topic at hand, and often I find it to be the most valuable single reference online.
"Seems that they would behave more like original Suse did"
That's the context you chopped off. The GP realized that Yast was NOW open source, and was refering to the original release (keep in mind that SuSE kept it proprietary for years).
Find me precident. It's a federal crime to use the seal without explicit authorization or to sell an existing copy of the seal, and there are a great many exceptions to the free speach clause when it comes to the use of Presidential authority and tools of office.
To quote Wikipedia:
"The US government is publicly owned by its citizens."
I recall "of", "for" and "by". I do not recall any reference to "a government belonging to the people."
It's a fine point, but an important one, and the very reason that you're not allowed to take your share of the country with you if you decide to leave, but you are allowed to retain your property.
"Therefore, we own the presidential seal."
So very, very no.
If that were true, then every novelty store in the world would sell U.S. Presidential Seals, and it would be a worthless tool. Seals of state have been strongly protected in every government that I'm aware of, stretching back to the dawn of recorded history.
Because you can't authorize diplomatic action with a photo of your CEO? However, that's exactly what the seal of the President of the United States can be used for (and on a lesser scale, you can imagine that the signature of your CEO is used in the same way).
Honestly, I'm not quite sure why this is such a contentious topic. It's not as if they said that the Onion couldn't parody the seal, or applied a new rule that someone just made up. They're applying the same old rule that everyone has been living under for quite some time, and all the Onion has to do is modify the image so that it's clearly not the real seal.
The seal is the signature of the President (whoever that might be). It is arguably one of the most powerful seals in the world; it can literally move armies; any plane with that seal is given special dispensation at most major airports in the world regardless of who is one it; and it is a federal crime to sell goods emblazoned with it. No, that's not trademark law.
It's not a matter of confusion, but of the nature of the seal. This is not a trademark.
This might be hard for most people to understand these days (since we don't use seals the way we used to), but let me use an analogy. Let's say that The Oninon put up a story which featured your company's CEO's signature. I'm sure that within a short span of minutes, they'd get some pretty irate calls from your executive management team. Same exact deal here. The President's signature is actually not terribly potent, as he is only the temporary holder of the office. What's important is the seal which represents the office, regardless of who holds it. It's more than a flag or a signature or a logo. It's represents the authorization of the President of the United States. This is why you cannot sell any item that contains the seal (for example, someone was sharing cigars with the seal at the office the other day, since he didn't smoke and couldn't do anything else with them).
I'm no banner-waver for this administration, but in this case, I would hope that any executive administration would have come down swiftly on such use of the seal.
It's trivial for The Onion to make a parody of the seal, and they know better. This smells like a grab for headlines to me.
"You don't read one sentence out of the middle then skip to a random one three lines down, then read every fourth word, then go to the top and read only the capitalized words."
I see you're not dyslexic... yeah, I really do. That's what you're not getting. I read in random bursts of non-linear text. If I didn't, I'd read a lot faster, but I'm not sure my comprehension speed would be as good. As it stands, I can "read" a document several times faster than most people that I know because I don't actually read it, I just extract the information I need from it.
"There's no mystical transcendent mode of "uber-literacy" that allows one to absorb information better than the linear, serial way around which our human languages are designed, and for which we have trained ourselves to process since birth."
Excuse me, but this is simply not my experience. Let's just take Wikipedia as an example. I visit hypertext looking for information on what "linking" is all about. I find a link early-on to hyperlink. This gets me to a page that you seem to want to peruse linearly, but I rarely do this. Instead, I glance at the first sentence, and then skip down to the TOC, reviewing it for keywords that help me get my berrings. Then I skip to a section that seems to discuss the specifics that I'm interested in.
I then tend to cast out through the category link to determine what other topics are related.
In many cases, I'm also scanning pictures first, as they can often help to explain what I want to know faster than the text.
Linear? Good lord, if I had to learn from the Web linearly, I'd still be trying to figure out what a LOL is.
I use a MySQL database to store 2+GB (~4million records) of new data per day, on which I run various reports. I had some significant problems early on using a very old copy of RH7.3 (which I think had a flakey driver for the RAID array I'm using, so it's not very fair of me to blame MySQL for that), but having switched to a modern release of Linux, it has been smooth sailing. I use MERGE tables to split up data into 1-week chunks, and then query them as one. It works amazingly well.
I'm really drooling over 5.0, but don't want to upgrade until it's out of beta (MySQL's betas are annoyingly long, but given how stable their releases have been, I guess I can't complain).
The only complaint that I have with MySQL at this point, that's not addressed by 5.0 (that I know of) is the limitations on the optimizer. If they improved their optimizer and took slightly better advantage of RAM than they do now, I'd never use anything else.
"In summary, auto-learn re-evaluates the message using only the static rules - not the bayes rules. Then, if the static rules give an extreme score that differs from the bayes score, and a couple of extra ad hoc conditions hold (number of "hits" exceeds some threshold) the bayes filter is trained."
l
Hrm... well, no.
First off "number of hits" is not an "extra ad hoc condition". Number of "hits" is exactly "score". There's no difference, just two pieces of terminology for the same thing. "Level" is another thing, but I won't go into that, as it's only there for the benefit of programs like procmail, and is not used internally.
Now, on to score with and without Bayes. I understand your initial concern, but I ask you to re-visit it. There has been substantial research into Bayes auto-learning under various systems, and what is show time and time again is that a set of well-balanced static rules (such as a set of tropisms or, in the case of SA, the static rule base) is far superior to any feedback-loop. This is why Bayes is discounted when computing auto-learning.
What you're doing is looking at outliers and saying, "see, this 'spam' was trained on as 'ham', and that means SA is broken." In fact, such errors will exist in both directions, but as long as the vast majority of spam trains as spam and the vast majority of ham trains as ham, the Bayes tokens will be correctly scored.
All that said, you seem uncomfortable with static rules of any kind, so if you don't buy into what I've said above, then I suggest that you stop using SA. Static rules are a giant advantage, but if you are going to defeat most of their value, then you might as well not suffer their overhead.
For further reading, I suggest: http://plg.uwaterloo.ca/~gvcormac/spamcormack.htm
If you want a slightly crunchier text, try:
l ?ACCT=683194&TICK=RTN4&STORY=/www/story/07-25-2002 /0001771721&EDATE=Jul+25,+2002
http://www.prnewswire.com/cgi-bin/micro_stories.p
It seems as if other posters are correct. This is just a ceramic with the usual sorts of ceramic properties, just much harder than most. It's not a useful builing material.