"Basically, anything where even a small error will adversely affect the results (i.e. large computations)."
Everything is adversely affected by data corruption. You personally don't care about being adversely affected, which is fine. But stop acting like your personal opinion is correct and everyone else's is wrong. Talk about arrogance.
"By most measures ECC is a non issue, otherwise market forces would have demanded it long ago."
And so anyone who wants it is wrong, because most people (who don't have any clue) don't want it? I guess Linux is "wrong" then, since if it was useful then "market forces would have demanded it long ago"?
"While you're correct that ECC itself doesn't cost any bandwidth you are only looking at the jump from registered to ecc memory, while most people are using unregistered modules, and in such cases there is a ~10% bandwidth hit in addition to a not insignificant latency hit."
Um, no. Registered memory has no bandwidth hit, and the latency hit is one clock. Maybe you shouldn't be parading your opinion around as fact when you don't even know what you are talking about?
Just because you don't care doesn't mean that anyone who does care is wrong. You pretended its a non-issue, when memory corruption actually causes very big problems, not trivial like you made it out to be. There's nothing wrong with wanting a board that supports ECC RAM. Also, there is no performance hit in bandwidth, only latency. And its so small as to be irrelivant.
First of all, its not like a cryptographic hash at all, its like a checksum, because it is a checksum.
Second, do you care a pointer that used to point to one memory address suddenly and mysteriously points to another, leading to a segfault, or blue screen if its kernel memory?
My KT7 stopped working the other day. As soon as you boot the alarm screams at you non-stop, even though nothing is wrong in the health section of the BIOS. And it locks up after about 30 seconds of being on, even when you are just sitting in the BIOS watching the seconds tick by.
Their support of course says "check the CPU temp" and calls it a day. Despite me clearly telling them everything is fine in the PC health section of the BIOS, and that I had swapped out RAM, CPU and video card to make certain it was the motherboard. The fact that they can't even bother to read support requests, nevermind give useful support is ridiculous.
"It just goes to prove that any popular software worth hacking that has security vulnerabilities will eventually have to deal with live working exploits"
No, it goes to show that firefox is a poorly written, buggy mess of crap. The code is terrible, and the developers do not care. Firefox has always had tons of security problems, popularity just makes people exploit them.
Popular software can be secure if its written properly by people who care about security. Don't try to pretend all software must suck just because mozilla does.
"The difference here is that in a database you're caching the index - i.e. the information you need for message operations - and not the message itself."
No, that's not a difference, that's exactly how it is with an imap server with indexing. Your indexes can be in RAM if you want, or they can be in files, which will be in the filesystem cache if they are accessed often. The actual message itself could very well be in RAM with either a filesystem or a database if it has been access recently.
"With ext3, for example, the minimum block size is 1k (and defaults to 4k). The relevent metadata is much much smaller."
So, if you indexes really are that tiny, make them memory, not file. And I wouldn't suggest using ext3 either.
"The big problem with that approach, though, is that you're now indexing on access, not on delivery, which means you're pushing the machines harder during peak use periods, rather than taking advantage of spare cycles throughout the day."
I'm not sure if you are familiar with maildirs, but there is a "new" directory where all the new mail goes. Once someone reads it, it is moved to the cur directory and indexed. Indexes are updated when the files are altered, just like with a database. If you really need your unread mail indexed as well, you could certainly change your LDA to do that for you.
The native integer type will stay 32 bit. AMD64 is an LP64 architecture, so longs and pointers will be 64 bits, ints stay 32 bit. And longs getting bigger really doesn't matter, people using longs for anything besides pointer math are being dumb anyways, and should fix their code. Having 64 bit longs just allows people who need 64 bit integers to have them without resorting to the slow "long long".
So really, its just pointers doubling in size that should effect your memory usage. This will not do anything remotely close to doubling the memory usage of an OS. We've had 64 bit architectures and OSs for years, you can look at them to see what kind of memory requirement increase to expect.
If the files are not in the buffer cache using fs storage, then they would also not be in the DBs cache using a db for storage. You will have LESS RAM available for caching data if you use a database, since you now have all the other stuff you don't need from a database using up RAM too.
If a fileserver is going to "choke on a flood of... disk ops" then so will a database server, or does it have magic powers to avoid access disks? Or are you trying to compare a fileserver with 512MB of RAM vs a database server with 16GB of RAM?
And the majority of mysql installations out there might well be used to provide an SQL frontend to simple, non-relational data, but that is definately not the case with real databases. As I have explained repeatedly now, the only thing a database will get you is faster searches (SEARCHES, not ACCESS), and that is entirely because of the indexing. Use an imap server that does indexing and suddenly the database is offering you nothing.
If car dealers have associated data, and cars have associated data, and they are related, then its relational data. Email is not like this. Messages sent to xyz isn't relational because who its sent to is an attribute of the mail, there is no seperate "table" containing "people" objects and their associated data.
Unread mail is in the "new" directory. All you would need to do is scan the new directories index to find messages sent to xyz, exactly the same as what a database would have to do, without the additional overhead of handling connecions, parsing sql, dealing with locking and contention from so many queries on the same table at the same time, updating transaction logs and flushing them to disk (making more disk access instead of less), etc.
Feel free to ignore reality and pretend you need to build a database. Those "data structures" are called files, and the filesystem is already written, and it takes care of them for me. Pretty handy huh? And decent imap servers already have indexing, so that's taken care of too. Oops, you suddenly get all the benefits a database would give you, without the huge overhead.
First of all, no, your RAM will not be enough to cache everything. Just like it won't with a database. You will end up with more RAM dedicated to caching with just filesystem, since a database takes up a bunch of RAM for other things.
And your last part makes no sense at all. Of course doing find will be slow. Just like select * would be slow on a table with a million rows. You shouldn't be trying to access everything at the same time no matter where you stored it.
And your mkdir problem is likely just because you don't have enough inodes. If you are creating a filesystem to store alot of files and directories, you will create one with enough inodes to have them all.
And of course, you don't want to make a million directories in the same dir. Make a few thousands directories (one for each domain) and then have each of those contain a few hundred of thousands maildirs (the users for that domain).
"All the other tests" are 2 other tests, searching and selecting all headers. This is not indicative of actual use, and doesn't demonstrate that mail storage should be a database. As I said previously, its just because they are comparing mysql with indices, to imap servers without indices. Throw dovecot in there with its indexing and all of a sudden mysql isn't faster at searching.
Well, that does show pretty clearly that mysql is slower, WAY slower for deletes. It doesn't even do normal operations like reads unfortunately. All it really did show was that searching is faster if you have indices to search. This didn't really need to be shown, everyone already knows this. However, you don't need a database to get an index, any imap server could impliment indexing if they wanted to, and in fact dovecot has:
I didn't say a database wouldn't work, I said its slower and provides no benefits compared to a filesystem. You never need any of the power of SQL to figure out what data to fetch, its not even relational data. Its a simple "read this file in that dir".
I am not sure what planet you live on where filesystems don't have HA and can't be backed up. Believe it or not, files can in fact be stored in multiple places at the same time, much like data in a database can. And if you want really good HA file storage, get a Netapp.
When using the openbsd machines for loading balancing and firewalling, you can run spamd on them. Its openbsd's very small and very effecient greylisting daemon. Doesn't matter what MTA you use, uses almost no resources, and cuts out the vast majority of spam before it even touches the mail servers.
No, finding a row is not at least as fast as reading a file. Lying will not change anything. Databases have additional overhead, handling the connections, parsing the SQL, scanning the indexes to find the row, fetching and formatting the row, and returning it. This even involves multiple trips across the kernel/userland boundry as many syscalls happen. Try tracing a database that is doing nothing, and see the tons of calls made just to do a single simple query. Then do the same thing with a program that just reads the contents of a file, and notice how its far fewer calls being made.
The simple, undeniable fact is that every RDBMS has more overhead than the filesystem, and offers NOTHING in return when dealing with simple file storage which needs none of the features of a database. A database is great for being a database, it is not great for being a filesystem, that's not what it was designed for, but it is what the filesystem was designed for (suprise!).
No need to get all defensive, its just "a few comments" like I said. Because the OP wanted opinions, and your post was close to what I found works well, its faster for me to just point out how he can change your recipe than repeat everything.
And actually, I did bake almost exactly "your" cake, sans exim. Hence my comments about how to improve it and make it taste better. Its not a perfect cake, it has room to be improved. In future iterations, I refined my original recipe. This is a good thing, not something to be afraid of or angry about.
Courier was a bit of a pig, so I tried dovecot and found it worked better. SA was an enormous pig, and several statistical filters are both faster, use less RAM, and are more effective. If you've only made one cake, and have been looking at it for 3 years, how do you know you couldn't bake a better cake?
You won't know if your cake can be improved until you try. Don't be hostile to constructive critisism, you might improve your baking skills. You didn't sound happy with courier, have you tried dovecot? I am not telling you that you must change your mail servers or else, I am just offering advice, you don't have to take it, nor does the OP.
Uh, did you miss the part where I said "a couple" and mentioned CARP? OpenBSD comes out of the box with redundant, syncronized state, failover firewall functionality. CARP is like VRRP or HSRP, only free. You run two, or three, or four, etc openbsd machines, if ones dies, or you want to take it down, it doesn't matter at all.
And openbsd doesn't have to be on a PC, it runs on real hardware too, sparc64 and alpha work fine. You can also get no moving parts i386 hardware, not everything with an intel chip is a PC.
Again, you are completely and totally wrong. Your experience is clearly lacking, anyone with ANY experience with filesystems and with RDBMSs will tell you, you are wrong. Trying to pretend I am making this up for some reason only makes you look even less knowledgable.
You don't care what kind of cache the OS impliments? Maybe that's why you have no idea what you are talking about? Start caring, and start learning. Filesystem reads DO NOT go to disk, that's the fucking point, that's what a cache is, its storing it IN RAM. Filesystem reads go to the buffer cache, if something is not in the cache, the kernel reads it into the cache. Those tens of thousands of users checking every minute are NO LOAD ON THE DISKS AT ALL. The database would cache this data in RAM to prevent disk access, and so does the filesystem. I suggest you read a book on filesystem or even just general OS internals so you understand that what you are saying is total nonsense.
And again, filesystems are much, much, much, a huge fucking enormous order of magnitude faster at performing data updates than a database, especially since the database also has to update its transaction log and indices, and flush them to disk. And like I said, FFS + softupdates makes deletes even faster. Mail servers don't have anything in common with file servers in terms of resource usage? Maybe you should tell that to all the hundreds of huge mail system admins out there who will tell you that you are completely and totally full of shit. Its creating, reading and deleting files, sounds pretty fucking fileserver-like to me. Sticking it in a database makes database server-like, sure. It also makes it much, much slower.
And for your second post, maybe you use windows for your mail servers or something insane like that? But in the real world, the filesystem's buffer cache can and will be as big as it can. Do a top on a linux machine with a 2.4> kernel, and notice how free is almost none? Its using all the spare RAM for the buffer cache. So quit talking complete bullshit and then lying more when you get called out on it.
Sendmail or postfix or even qmail will do the job just as well as exim. Just say "use whatever MTA you like" instead of trying to pretend your MTA of choice is the only way to go.
I found dovecot to be faster than courier, and use less RAM. It also does ldap, ssl, maildir, etc, etc.
Making a mess of ugly directories is not needed if you are using a decent filesystem. I know the BSD's FFS has dirhash to make handling tens of thousands of files/dirs in a single directory work just fine, Solaris has something too. I'm sure one of the dozens of linux filesystems has this dealt with. And don't bother with a linux or BSD NFS server for something this size, just go netapp.
Spamassassin!?!? Good lord man, you will need dozens of servers just to run that. Its incredibly resource intensive, it needs its own server just for a few thousands users. Perl and tons of string mangling is not a resource effecient spam filtering solution. Use a statistical filter written in C.
Don't make all the boxes the same. Its a much more effective use of resources to dedicate these X boxes as MTAs, these as POP, these as IMAP, LDAP over there, etc, etc. You don't want all that stuff on every server, or you are wasting lots of RAM with identical processes on seperate servers that can't share resources. Its also easier to tune the OS to fit exactly what the box is doing, which doesn't work so well when its doing everything.
Hardware load balancers are not at all needed. Throw a couple OpenBSD machines running CARP and PF in front of the servers. It will be cheaper, and gives you firewall + load balancer in one.
There is absolutely no reason at all to leave 80% free space, 15% is more than enough to ensure you don't have fragmentation problems (I am assuming you are using a reasonable filesystem of course).
Second, people with rediculously frequent mail check times are not any more of a problem. Modern operating systems use file system caches. You do not have to touch the disk subsystem in any way, frequently accessed data will be in RAM.
And finally, a database has alot of extra overhead, and there is alot of deletes going on. Sure, such a select statement would work, but reading the files in one directory is an order of magnitude faster. And the deletes will really hammer your database. FFS+softupdates makes file deletion extremely fast. A relational database is not the answer for everything, stop trying to pretend it is. Use the right tool for the job, and for storing files, a filesystem is the right tool. Its not relational data, it doesn't need to be queried in arbitrary, complex ways, so it doesn't belong in a relational database.
GNUtards complaining about other systems not being posix compliant? That's pretty funny. The -f option to ps is an XSI extention, and doesn't need to exist since it gives you a subset of what other options give, and you can get precisely what you want/need with the -o option. Adding every random option under the sun, and creating bloated, broken utilities is not a good thing, although it does seem to be GNU policy. Then again, it takes some special people to make "man" exploitable.
From all the posts on the lists I've seen, it would appear that most people coming from linux complain because they can't be bothered to read the man pages to see what the options they should be using are, and just complain that everything isn't the same as the last linux box they used. Hell, half the complaints are dumbasses who think that the BSDs have a broken rm because they do rm * and it doesn't ask them what files they want to delete.
Extensions like what? I spend lots of time at a command line too, and that's why I can't stand linux machines, the GNU tools are awful compared to the BSD ones. I'd really be interested in hearing what is "missing" from the BSDs grep and ls, besides ls displaying everything in color.
"Basically, anything where even a small error will adversely affect the results (i.e. large computations)."
Everything is adversely affected by data corruption. You personally don't care about being adversely affected, which is fine. But stop acting like your personal opinion is correct and everyone else's is wrong. Talk about arrogance.
"By most measures ECC is a non issue, otherwise market forces would have demanded it long ago."
And so anyone who wants it is wrong, because most people (who don't have any clue) don't want it? I guess Linux is "wrong" then, since if it was useful then "market forces would have demanded it long ago"?
"While you're correct that ECC itself doesn't cost any bandwidth you are only looking at the jump from registered to ecc memory, while most people are using unregistered modules, and in such cases there is a ~10% bandwidth hit in addition to a not insignificant latency hit."
Um, no. Registered memory has no bandwidth hit, and the latency hit is one clock. Maybe you shouldn't be parading your opinion around as fact when you don't even know what you are talking about?
Just because you don't care doesn't mean that anyone who does care is wrong. You pretended its a non-issue, when memory corruption actually causes very big problems, not trivial like you made it out to be. There's nothing wrong with wanting a board that supports ECC RAM. Also, there is no performance hit in bandwidth, only latency. And its so small as to be irrelivant.
First of all, its not like a cryptographic hash at all, its like a checksum, because it is a checksum.
Second, do you care a pointer that used to point to one memory address suddenly and mysteriously points to another, leading to a segfault, or blue screen if its kernel memory?
My KT7 stopped working the other day. As soon as you boot the alarm screams at you non-stop, even though nothing is wrong in the health section of the BIOS. And it locks up after about 30 seconds of being on, even when you are just sitting in the BIOS watching the seconds tick by.
Their support of course says "check the CPU temp" and calls it a day. Despite me clearly telling them everything is fine in the PC health section of the BIOS, and that I had swapped out RAM, CPU and video card to make certain it was the motherboard. The fact that they can't even bother to read support requests, nevermind give useful support is ridiculous.
The SFU utils are the openbsd userland, which is not GPL. Run strings on the binaries, the copyright notice is pretty obvious.
"It just goes to prove that any popular software worth hacking that has security vulnerabilities will eventually have to deal with live working exploits"
No, it goes to show that firefox is a poorly written, buggy mess of crap. The code is terrible, and the developers do not care. Firefox has always had tons of security problems, popularity just makes people exploit them.
Popular software can be secure if its written properly by people who care about security. Don't try to pretend all software must suck just because mozilla does.
"The difference here is that in a database you're caching the index - i.e. the information you need for message operations - and not the message itself."
No, that's not a difference, that's exactly how it is with an imap server with indexing. Your indexes can be in RAM if you want, or they can be in files, which will be in the filesystem cache if they are accessed often. The actual message itself could very well be in RAM with either a filesystem or a database if it has been access recently.
"With ext3, for example, the minimum block size is 1k (and defaults to 4k). The relevent metadata is much much smaller."
So, if you indexes really are that tiny, make them memory, not file. And I wouldn't suggest using ext3 either.
"The big problem with that approach, though, is that you're now indexing on access, not on delivery, which means you're pushing the machines harder during peak use periods, rather than taking advantage of spare cycles throughout the day."
I'm not sure if you are familiar with maildirs, but there is a "new" directory where all the new mail goes. Once someone reads it, it is moved to the cur directory and indexed. Indexes are updated when the files are altered, just like with a database. If you really need your unread mail indexed as well, you could certainly change your LDA to do that for you.
The native integer type will stay 32 bit. AMD64 is an LP64 architecture, so longs and pointers will be 64 bits, ints stay 32 bit. And longs getting bigger really doesn't matter, people using longs for anything besides pointer math are being dumb anyways, and should fix their code. Having 64 bit longs just allows people who need 64 bit integers to have them without resorting to the slow "long long".
So really, its just pointers doubling in size that should effect your memory usage. This will not do anything remotely close to doubling the memory usage of an OS. We've had 64 bit architectures and OSs for years, you can look at them to see what kind of memory requirement increase to expect.
Blech, that should say "contain a few hundred or thousand maildirs", not "few hundred of thousands maildirs". Time for a coffee.
If the files are not in the buffer cache using fs storage, then they would also not be in the DBs cache using a db for storage. You will have LESS RAM available for caching data if you use a database, since you now have all the other stuff you don't need from a database using up RAM too.
... disk ops" then so will a database server, or does it have magic powers to avoid access disks? Or are you trying to compare a fileserver with 512MB of RAM vs a database server with 16GB of RAM?
If a fileserver is going to "choke on a flood of
And the majority of mysql installations out there might well be used to provide an SQL frontend to simple, non-relational data, but that is definately not the case with real databases. As I have explained repeatedly now, the only thing a database will get you is faster searches (SEARCHES, not ACCESS), and that is entirely because of the indexing. Use an imap server that does indexing and suddenly the database is offering you nothing.
If car dealers have associated data, and cars have associated data, and they are related, then its relational data. Email is not like this. Messages sent to xyz isn't relational because who its sent to is an attribute of the mail, there is no seperate "table" containing "people" objects and their associated data.
Unread mail is in the "new" directory. All you would need to do is scan the new directories index to find messages sent to xyz, exactly the same as what a database would have to do, without the additional overhead of handling connecions, parsing sql, dealing with locking and contention from so many queries on the same table at the same time, updating transaction logs and flushing them to disk (making more disk access instead of less), etc.
Feel free to ignore reality and pretend you need to build a database. Those "data structures" are called files, and the filesystem is already written, and it takes care of them for me. Pretty handy huh? And decent imap servers already have indexing, so that's taken care of too. Oops, you suddenly get all the benefits a database would give you, without the huge overhead.
First of all, no, your RAM will not be enough to cache everything. Just like it won't with a database. You will end up with more RAM dedicated to caching with just filesystem, since a database takes up a bunch of RAM for other things.
And your last part makes no sense at all. Of course doing find will be slow. Just like select * would be slow on a table with a million rows. You shouldn't be trying to access everything at the same time no matter where you stored it.
And your mkdir problem is likely just because you don't have enough inodes. If you are creating a filesystem to store alot of files and directories, you will create one with enough inodes to have them all.
And of course, you don't want to make a million directories in the same dir. Make a few thousands directories (one for each domain) and then have each of those contain a few hundred of thousands maildirs (the users for that domain).
"All the other tests" are 2 other tests, searching and selecting all headers. This is not indicative of actual use, and doesn't demonstrate that mail storage should be a database. As I said previously, its just because they are comparing mysql with indices, to imap servers without indices. Throw dovecot in there with its indexing and all of a sudden mysql isn't faster at searching.
Well, that does show pretty clearly that mysql is slower, WAY slower for deletes. It doesn't even do normal operations like reads unfortunately. All it really did show was that searching is faster if you have indices to search. This didn't really need to be shown, everyone already knows this. However, you don't need a database to get an index, any imap server could impliment indexing if they wanted to, and in fact dovecot has:
http://www.dovecot.org/
I didn't say a database wouldn't work, I said its slower and provides no benefits compared to a filesystem. You never need any of the power of SQL to figure out what data to fetch, its not even relational data. Its a simple "read this file in that dir".
I am not sure what planet you live on where filesystems don't have HA and can't be backed up. Believe it or not, files can in fact be stored in multiple places at the same time, much like data in a database can. And if you want really good HA file storage, get a Netapp.
When using the openbsd machines for loading balancing and firewalling, you can run spamd on them. Its openbsd's very small and very effecient greylisting daemon. Doesn't matter what MTA you use, uses almost no resources, and cuts out the vast majority of spam before it even touches the mail servers.
No, finding a row is not at least as fast as reading a file. Lying will not change anything. Databases have additional overhead, handling the connections, parsing the SQL, scanning the indexes to find the row, fetching and formatting the row, and returning it. This even involves multiple trips across the kernel/userland boundry as many syscalls happen. Try tracing a database that is doing nothing, and see the tons of calls made just to do a single simple query. Then do the same thing with a program that just reads the contents of a file, and notice how its far fewer calls being made.
The simple, undeniable fact is that every RDBMS has more overhead than the filesystem, and offers NOTHING in return when dealing with simple file storage which needs none of the features of a database. A database is great for being a database, it is not great for being a filesystem, that's not what it was designed for, but it is what the filesystem was designed for (suprise!).
No need to get all defensive, its just "a few comments" like I said. Because the OP wanted opinions, and your post was close to what I found works well, its faster for me to just point out how he can change your recipe than repeat everything.
And actually, I did bake almost exactly "your" cake, sans exim. Hence my comments about how to improve it and make it taste better. Its not a perfect cake, it has room to be improved. In future iterations, I refined my original recipe. This is a good thing, not something to be afraid of or angry about.
Courier was a bit of a pig, so I tried dovecot and found it worked better. SA was an enormous pig, and several statistical filters are both faster, use less RAM, and are more effective. If you've only made one cake, and have been looking at it for 3 years, how do you know you couldn't bake a better cake?
You won't know if your cake can be improved until you try. Don't be hostile to constructive critisism, you might improve your baking skills. You didn't sound happy with courier, have you tried dovecot? I am not telling you that you must change your mail servers or else, I am just offering advice, you don't have to take it, nor does the OP.
Uh, did you miss the part where I said "a couple" and mentioned CARP? OpenBSD comes out of the box with redundant, syncronized state, failover firewall functionality. CARP is like VRRP or HSRP, only free. You run two, or three, or four, etc openbsd machines, if ones dies, or you want to take it down, it doesn't matter at all.
And openbsd doesn't have to be on a PC, it runs on real hardware too, sparc64 and alpha work fine. You can also get no moving parts i386 hardware, not everything with an intel chip is a PC.
Again, you are completely and totally wrong. Your experience is clearly lacking, anyone with ANY experience with filesystems and with RDBMSs will tell you, you are wrong. Trying to pretend I am making this up for some reason only makes you look even less knowledgable.
You don't care what kind of cache the OS impliments? Maybe that's why you have no idea what you are talking about? Start caring, and start learning. Filesystem reads DO NOT go to disk, that's the fucking point, that's what a cache is, its storing it IN RAM. Filesystem reads go to the buffer cache, if something is not in the cache, the kernel reads it into the cache. Those tens of thousands of users checking every minute are NO LOAD ON THE DISKS AT ALL. The database would cache this data in RAM to prevent disk access, and so does the filesystem. I suggest you read a book on filesystem or even just general OS internals so you understand that what you are saying is total nonsense.
And again, filesystems are much, much, much, a huge fucking enormous order of magnitude faster at performing data updates than a database, especially since the database also has to update its transaction log and indices, and flush them to disk. And like I said, FFS + softupdates makes deletes even faster. Mail servers don't have anything in common with file servers in terms of resource usage? Maybe you should tell that to all the hundreds of huge mail system admins out there who will tell you that you are completely and totally full of shit. Its creating, reading and deleting files, sounds pretty fucking fileserver-like to me. Sticking it in a database makes database server-like, sure. It also makes it much, much slower.
And for your second post, maybe you use windows for your mail servers or something insane like that? But in the real world, the filesystem's buffer cache can and will be as big as it can. Do a top on a linux machine with a 2.4> kernel, and notice how free is almost none? Its using all the spare RAM for the buffer cache. So quit talking complete bullshit and then lying more when you get called out on it.
Sendmail or postfix or even qmail will do the job just as well as exim. Just say "use whatever MTA you like" instead of trying to pretend your MTA of choice is the only way to go.
I found dovecot to be faster than courier, and use less RAM. It also does ldap, ssl, maildir, etc, etc.
Making a mess of ugly directories is not needed if you are using a decent filesystem. I know the BSD's FFS has dirhash to make handling tens of thousands of files/dirs in a single directory work just fine, Solaris has something too. I'm sure one of the dozens of linux filesystems has this dealt with. And don't bother with a linux or BSD NFS server for something this size, just go netapp.
Spamassassin!?!? Good lord man, you will need dozens of servers just to run that. Its incredibly resource intensive, it needs its own server just for a few thousands users. Perl and tons of string mangling is not a resource effecient spam filtering solution. Use a statistical filter written in C.
Don't make all the boxes the same. Its a much more effective use of resources to dedicate these X boxes as MTAs, these as POP, these as IMAP, LDAP over there, etc, etc. You don't want all that stuff on every server, or you are wasting lots of RAM with identical processes on seperate servers that can't share resources. Its also easier to tune the OS to fit exactly what the box is doing, which doesn't work so well when its doing everything.
Hardware load balancers are not at all needed. Throw a couple OpenBSD machines running CARP and PF in front of the servers. It will be cheaper, and gives you firewall + load balancer in one.
There is absolutely no reason at all to leave 80% free space, 15% is more than enough to ensure you don't have fragmentation problems (I am assuming you are using a reasonable filesystem of course).
Second, people with rediculously frequent mail check times are not any more of a problem. Modern operating systems use file system caches. You do not have to touch the disk subsystem in any way, frequently accessed data will be in RAM.
And finally, a database has alot of extra overhead, and there is alot of deletes going on. Sure, such a select statement would work, but reading the files in one directory is an order of magnitude faster. And the deletes will really hammer your database. FFS+softupdates makes file deletion extremely fast. A relational database is not the answer for everything, stop trying to pretend it is. Use the right tool for the job, and for storing files, a filesystem is the right tool. Its not relational data, it doesn't need to be queried in arbitrary, complex ways, so it doesn't belong in a relational database.
GNUtards complaining about other systems not being posix compliant? That's pretty funny. The -f option to ps is an XSI extention, and doesn't need to exist since it gives you a subset of what other options give, and you can get precisely what you want/need with the -o option. Adding every random option under the sun, and creating bloated, broken utilities is not a good thing, although it does seem to be GNU policy. Then again, it takes some special people to make "man" exploitable.
From all the posts on the lists I've seen, it would appear that most people coming from linux complain because they can't be bothered to read the man pages to see what the options they should be using are, and just complain that everything isn't the same as the last linux box they used. Hell, half the complaints are dumbasses who think that the BSDs have a broken rm because they do rm * and it doesn't ask them what files they want to delete.
Extensions like what? I spend lots of time at a command line too, and that's why I can't stand linux machines, the GNU tools are awful compared to the BSD ones. I'd really be interested in hearing what is "missing" from the BSDs grep and ls, besides ls displaying everything in color.