3) replicate your databases to all machines so
db access is always LOCAL
This is probably a bad idea. Accessing the database over a socket is going to be much less resource intensive than accessing it locally. With the database locally, the database server uses up CPU time and disk I/O time. Disk I/O on a web server is very important. If the entire database isn't cached in memory, then it is going to be hitting the disk. The memory used up caching the database cannot be used by the OS to cache web content. A separate database server with a lot of RAM will almost always work better than a local one with less RAM.
This Apache nonsense of cramming everything into the webserver is very bad engineering practice. A web server should serve web content. A web application should generate web content. A database server should serve data. These are all separate processes that should not be combined.
Well, by using the same brilliant skills of analysis you do, this article is running on Apache, and the webserver is dead. That must mean that Apache is the Taco Bell of the webserver world, right?
That would be about right. It's cheap, lots of people use it, but it's certainly not the best.
Any web server can be good enough as long as you spread the load over enough boxes. Apache is much more flexible than Zeus.
Sure, but if you need 2+ Apache boxes to handle the load of one Zeus box, wouldn't it make more sense to buy Zeus in the first place?
I would like you to qualify your statement about Apache being more flexible. Zeus is a lot easier to configure than Apache. In what aspects is Apache more flexible?
When it comes to mass virtual hosting, Zeus beats the pants off Apache. Zeus' configuration is fully scriptable out of the box. Apache's is not. Zeus can do wildcard subservers. Apache cannot. Zeus does not require restarting to make configuration changes or add sites. Apache does. Sites can only be added in Apache if using the very limited mass vhost module.
Yep. fnord is probably the fastest small web server available. There are basically two ways to engineer a fast web server: make it as small as possible to incur the least overhead or make it complicated and use every possible trick to make it fast.
If you need features that a small web server like fnord can't provide and speed is a must, then Zeus is probably the best choice. Zeus beats the pants off every other UNIX web server. It's "tricks" include non blocking I/O, linear scalability with regard to number of CPU's, platform specific system calls and mechanisms (acceptx(), poll(), sendpath,/dev/poll, etc.), sendfile() and sendfile() cache, memory and mmap() file cache, DNS cache, stat() cache, multiple accept() per I/O event notification, tuning the socket buffers, disabling nagle, tuning the listen queue, SSL disk cache, log file cache, etc.
Which design is better? Depends on your needs. It is quite interesting that the only way to beat a really small web server is to make one really big that includes everything but the kitchen sink.
Now, to the other purpose of my message - you mention awk/sed scripts to run across a mail spool, do you happen to know of any that would run across a spool and remove messages by age? I maintain several (RFC822) spools for use in my IMAP clients at all my various locations, mostly mailing lists, digests, etc. and have searched Google in vain for a script that will parse out old messages. The only other viable solution I've found is to simply bulk-archive the entire spool at xxx interval, which is, to say the least, an imperfect solution. I'd write it myself, but I'm not quite comfortable enough with sed/awk to prune entire messages, and I'd likely wind up going through a hundred test spools before I got it right.:) Any pointers would be greatly appreciated.
Do your self a favor and stop using mbox format. It sucks. You should be using maildir. With maildir, every message is a separate file. This means no locking, no corruption, no crazy message scanning, etc. Want to delete every message over 180 days old? Easy:
Does a two year stint at the ISC maintaining the BIND 8 resolver and tree propagation code count? Moreover, I'd like to think that there are those who are perhaps younger and smarter than me who might be able to "fuck with" and actually do something new with the given software. That's what open source is all about.
Oh, I get it now. You are spreading FUD about Dan's software because he can write secure DNS software and you can't?
The fact is, the licensing of qmail makes it a legal liability to distribute, and is avoided by groups like Debian and RedHat. I have no hate for qmail but let's get our terminology right.
Wrong. Again, more FUD. It is perfectly legal to distribute qmail if the binaries match his. He does this to promote compatibility. Debian does not distribute qmail because it is not Free Software, not because it isn't Open Source. Find out exactly why qmail is not in Debian as a binary package. qmail is in Debian non-free as a source package.
Anyone notice how the author spends the bulk of the article talking about a mail setup using the prorpietary qmail MTA (which has a look-but-don't-touch license [infoave.net] that's in many ways more restrictive than Microsoft's Shared Source) and then goes ahead and praises it as being Open Source in the last paragraph?
qmail is not open source? It is distributed as source code, not binary code. I don't see how that is anything but open source. It is not, however, free software.
e.g. say you have a rule that when MyApp::toolbar_visible == true then the toolbar must be shown. But most programming languages don't allow this kind of declarative specification. Instead, you have to track down every line of code that modifies MyApp::toolbar_visible, and tack on extra code to hide/show the toolbar depending on its new value... Or, if you have a slider control that is supposed to reflect the value of some variable. You again have to track down every point where that variable can be modified, and insert code to update the scrollbar.
Delphi has properties which do exactly that. Setting the property value executes code, which can update the GUI.
How exactly did you go about setting this up? Can you go into details?
It's really not that difficult, if you have a good hosting setup. Our hosting system is designed to be 100% automated, all controlled from our custom written control panel.
We use Zeus' virtual server feature. Every customer has their own virtual server. In Zeus, everything is a virtual server. There is no concept of a main server like in Apache. Thus each virtual server can have its own configuration and be independently controlled.
Each virtual server can have subservers. Subservers is similar to Apache's mass virtual hosting (but better). A virtual server has a subserver directory, which is a directory full of symlinks with the name of the site hostname, pointing to the site content directory. When a customer runs out of bandwidth, we simply place a.htaccess file (Zeus mostly supports these) with a redirect line in the root of the subserver directory. Due to the directory layout, customers do not have access this directory.
If I had to set this up under Apache (ick), I would use the mass virtual hosting module. The virtual host directory would be full of symlinks with the name of the site hostname, pointing to the site content, as with Zeus. The difference with Apache is that all sites for all customers would be in one directory. Apache configuration is much more messy and not easily scriptable. To redirect the disabled sites, the symlinks would be changed to point to a different directory.
Accumulating bandwidth statistics for this can be done in different ways. We use a custom ISAPI module which sends logging info through a socket to a daemon which accumulates statistics and periodically saves them to a MySQL database. Doing a SQL query for every web request would not work.
For Apache, a similar module could probably be written. Or server logs could be processed. Logs could be written to a pipe and the process on the other end could accumulate bandwidth and other statistics (or write out logs for something like Analog to process).
A script runs from cron that processes the bandwidth statistics in the database. It updates the bandwidth counters in the database for the user, making it easy to see how much bandwidth is left from the control panel. When a customer is low or runs out, the script sends a notification email. When the customer is out of bandwidth, the script updates a table which tells the backend hosting daemon to actually disable the sites. After the customer buys more bandwidth, the script updates the table telling the daemon to enable them again.
(Hint: SQL databases are a very nice way to let daemons on different servers communicate with each other.)
if your md5 is x length long and your file is 10x length, then there are 9x as many other possibilities for the content of the file to give the same md5 sum. in other words, md5 can be Spoofed by adding random bits to the end(replacing legit bits however) until one gives the same md5 as another.
Sure, but how do you figure out what random bits to add? MD5 is cryptographically secure. See RFC 1321:
It is conjectured that the difficulty of coming up with two messages having the same message digest is on the order of 2^64 operations, and that the difficulty of coming up with any message having a given message digest is on the order of 2^128 operations.
We gave up on throttling. It just doesn't work. And not from a technical standpoint, either. If you are going to do throttling, then you need a web server that does real throttling. The only one I know of is Zeus. It does real throttling, letting you limit the total bandwidth for all of a user's sites to a bytes/sec value. No Apache modules do this. Even thttpd doesn't seem to get this right.
I assume that you want to limit bandwidth that your customers use because the bandwidth to your server(s) is limited (i.e. you don't have a 100mbit connect to the internet). In this case, Apache modules will not do what you want. Someone puts up a 10mb file. That file gets downloaded and uses up all of your outgoing bandwidth. While it is being downloaded, the Apache throttle module refuses other requests. This is obviously not what you want.
So suppose you end up using Zeus, or find some other way to do real throttling. Now what do you set the throttle speed to? 5gb over a month averages a little less than 2k/sec. Say you set it higher, like 20k/sec, and there are ten connections downloading that user's files (which can easily happen with certain browsers). What does the average clueless user or webmaster think? They don't understand throttling. They just think that the website is slow and that your service sucks.
We would throttle down user's sites when their bandwidth ran out. Customers did not understand that they had run out of bandwidth, even though they were notified via email. They just thought their sites were slow. We received a lot complaints about that.
We found that the best thing to do is to not throttle and to presell bandwidth cheap. Our different packages come with different amounts of bandwidth, ranging from 5gb to 180gb. After that, customers can purchase extra bandwidth (for $0.50/gb). Customers receive a notification via email when their bandwidth is running low and again when it is completely gone. When their bandwidth is gone, we redirect their sites to a page stating that they used up all their bandwidth.
This solution is simple and it works. Customers always know how much bandwidth they have left and can buy more at anytime. We never have to worry about users running up a huge bill and not paying it, since everything is prepaid.
Actually, it would be kind of cool if this was part of the default install in Gentoo - along with some P2P program for finding others online who are running the same app. Then you could download the source code and distributedly (is that a word) compile it... As long as your network is fast enough, you could significantly reduce the amount of time compiling, etc. on slow machines.
Wonderful. Then you can get rooted when others running the P2P app have modified the compiler to insert trojans into the generated binaries.
Windows NT 3.51(3.1?) through XP share drive C as C$.
Yes, but they are hidden administrative shares. You need an administrator password to access them. The article is about public (i.e. no password) shares.
but if you can handle strings in C as easily as in java, please post a link the the libraries you are using. strings suck so much in C, I have to use C++. C++ sucks ass for strings too, so I'm left with java and perl.
If a server has to run as root to access a privlidged port (ie http) have your firewall redirect all packets sent to port 80 to port 8080.
This is not necessary. After a server bind()'s a socket to a privleged port and does other necessary tasks (opening log files, etc.) it can drop root privileges using setuid() / setgid(). This is standard practice and almost all servers do this.
You block by IP at router level, so even when you store the URLs for reference, you're actually blocking the whole server? What about other content on the server? What about virtual hosts?
If your hosting provider is hosting child porn, then you had better get another provider. It's the same thing as spam blacklists. An ISP that allows spam gets blacklisted. So an ISP either has to prevent spamming or risk losing all non spamming customers.
In other words, can you say if you're doing overbroad blocking? Any innocent bystanders getting -j REJECTed?
That's half point.
(This is how it works. I'm certainly not saying I agree with it.)
How do they know the origin without examining the headers? Headers are part of the content.
The purpose of most blacklists is to stop the spam during the SMTP session. The IP that is connected to the SMTP server is the one that is checked. This doesn't block outputs (secure servers that are relaying for insecure inputs) but not all mail from those should be blocked.
And what happens when I am running my own DNS resolver locally, as I do on my cable modem? Or what happens if I know the IP address? Then I can still access it and they aren't blocking it.
Advertisers want to know about the demographics of the people who will be visiting the site. It would be difficult (although not impossible) to develop this information for a honeypot.
In this case, you wouldn't have demographics, but you would know every site that people using the network visit. This is arguably much more valuable to advertisers.
3) replicate your databases to all machines so
db access is always LOCAL
This is probably a bad idea. Accessing the database over a socket is going to be much less resource intensive than accessing it locally. With the database locally, the database server uses up CPU time and disk I/O time. Disk I/O on a web server is very important. If the entire database isn't cached in memory, then it is going to be hitting the disk. The memory used up caching the database cannot be used by the OS to cache web content. A separate database server with a lot of RAM will almost always work better than a local one with less RAM.
This Apache nonsense of cramming everything into the webserver is very bad engineering practice. A web server should serve web content. A web application should generate web content. A database server should serve data. These are all separate processes that should not be combined.
Well, by using the same brilliant skills of analysis you do, this article is running on Apache, and the webserver is dead. That must mean that Apache is the Taco Bell of the webserver world, right?
That would be about right. It's cheap, lots of people use it, but it's certainly not the best.
Any web server can be good enough as long as you spread the load over enough boxes. Apache is much more flexible than Zeus.
Sure, but if you need 2+ Apache boxes to handle the load of one Zeus box, wouldn't it make more sense to buy Zeus in the first place?
I would like you to qualify your statement about Apache being more flexible. Zeus is a lot easier to configure than Apache. In what aspects is Apache more flexible?
When it comes to mass virtual hosting, Zeus beats the pants off Apache. Zeus' configuration is fully scriptable out of the box. Apache's is not. Zeus can do wildcard subservers. Apache cannot. Zeus does not require restarting to make configuration changes or add sites. Apache does. Sites can only be added in Apache if using the very limited mass vhost module.
Yep. fnord is probably the fastest small web server available. There are basically two ways to engineer a fast web server: make it as small as possible to incur the least overhead or make it complicated and use every possible trick to make it fast.
/dev/poll, etc.), sendfile() and sendfile() cache, memory and mmap() file cache, DNS cache, stat() cache, multiple accept() per I/O event notification, tuning the socket buffers, disabling nagle, tuning the listen queue, SSL disk cache, log file cache, etc.
If you need features that a small web server like fnord can't provide and speed is a must, then Zeus is probably the best choice. Zeus beats the pants off every other UNIX web server. It's "tricks" include non blocking I/O, linear scalability with regard to number of CPU's, platform specific system calls and mechanisms (acceptx(), poll(), sendpath,
Which design is better? Depends on your needs. It is quite interesting that the only way to beat a really small web server is to make one really big that includes everything but the kitchen sink.
Now, to the other purpose of my message - you mention awk/sed scripts to run across a mail spool, do you happen to know of any that would run across a spool and remove messages by age? I maintain several (RFC822) spools for use in my IMAP clients at all my various locations, mostly mailing lists, digests, etc. and have searched Google in vain for a script that will parse out old messages. The only other viable solution I've found is to simply bulk-archive the entire spool at xxx interval, which is, to say the least, an imperfect solution. I'd write it myself, but I'm not quite comfortable enough with sed/awk to prune entire messages, and I'd likely wind up going through a hundred test spools before I got it right. :) Any pointers would be greatly appreciated.
/home/user/Maildir/ -atime +180 -exec rm -f {} \;
Do your self a favor and stop using mbox format. It sucks. You should be using maildir. With maildir, every message is a separate file. This means no locking, no corruption, no crazy message scanning, etc. Want to delete every message over 180 days old? Easy:
find
There are scripts to convert mbox to maildir and vice versa.
Does a two year stint at the ISC maintaining the BIND 8 resolver and tree propagation code count? Moreover, I'd like to think that there are those who are perhaps younger and smarter than me who might be able to "fuck with" and actually do something new with the given software. That's what open source is all about.
Oh, I get it now. You are spreading FUD about Dan's software because he can write secure DNS software and you can't?
The fact is, the licensing of qmail makes it a legal liability to distribute, and is avoided by groups like Debian and RedHat. I have no hate for qmail but let's get our terminology right.
Wrong. Again, more FUD. It is perfectly legal to distribute qmail if the binaries match his. He does this to promote compatibility. Debian does not distribute qmail because it is not Free Software, not because it isn't Open Source. Find out exactly why qmail is not in Debian as a binary package. qmail is in Debian non-free as a source package.
Anyone notice how the author spends the bulk of the article talking about a mail setup using the prorpietary qmail MTA (which has a look-but-don't-touch license [infoave.net] that's in many ways more restrictive than Microsoft's Shared Source) and then goes ahead and praises it as being Open Source in the last paragraph?
qmail is not open source? It is distributed as source code, not binary code. I don't see how that is anything but open source. It is not, however, free software.
e.g. say you have a rule that when MyApp::toolbar_visible == true then the toolbar must be shown. But most programming languages don't allow this kind of declarative specification. Instead, you have to track down every line of code that modifies MyApp::toolbar_visible, and tack on extra code to hide/show the toolbar depending on its new value... Or, if you have a slider control that is supposed to reflect the value of some variable. You again have to track down every point where that variable can be modified, and insert code to update the scrollbar.
Delphi has properties which do exactly that. Setting the property value executes code, which can update the GUI.
How exactly did you go about setting this up? Can you go into details?
.htaccess file (Zeus mostly supports these) with a redirect line in the root of the subserver directory. Due to the directory layout, customers do not have access this directory.
It's really not that difficult, if you have a good hosting setup. Our hosting system is designed to be 100% automated, all controlled from our custom written control panel.
We use Zeus' virtual server feature. Every customer has their own virtual server. In Zeus, everything is a virtual server. There is no concept of a main server like in Apache. Thus each virtual server can have its own configuration and be independently controlled.
Each virtual server can have subservers. Subservers is similar to Apache's mass virtual hosting (but better). A virtual server has a subserver directory, which is a directory full of symlinks with the name of the site hostname, pointing to the site content directory. When a customer runs out of bandwidth, we simply place a
If I had to set this up under Apache (ick), I would use the mass virtual hosting module. The virtual host directory would be full of symlinks with the name of the site hostname, pointing to the site content, as with Zeus. The difference with Apache is that all sites for all customers would be in one directory. Apache configuration is much more messy and not easily scriptable. To redirect the disabled sites, the symlinks would be changed to point to a different directory.
Accumulating bandwidth statistics for this can be done in different ways. We use a custom ISAPI module which sends logging info through a socket to a daemon which accumulates statistics and periodically saves them to a MySQL database. Doing a SQL query for every web request would not work.
For Apache, a similar module could probably be written. Or server logs could be processed. Logs could be written to a pipe and the process on the other end could accumulate bandwidth and other statistics (or write out logs for something like Analog to process).
A script runs from cron that processes the bandwidth statistics in the database. It updates the bandwidth counters in the database for the user, making it easy to see how much bandwidth is left from the control panel. When a customer is low or runs out, the script sends a notification email. When the customer is out of bandwidth, the script updates a table which tells the backend hosting daemon to actually disable the sites. After the customer buys more bandwidth, the script updates the table telling the daemon to enable them again.
(Hint: SQL databases are a very nice way to let daemons on different servers communicate with each other.)
if your md5 is x length long and your file is 10x length, then there are 9x as many other possibilities for the content of the file to give the same md5 sum. in other words, md5 can be Spoofed by adding random bits to the end(replacing legit bits however) until one gives the same md5 as another.
Sure, but how do you figure out what random bits to add? MD5 is cryptographically secure. See RFC 1321:
It is conjectured that the difficulty of coming up with two messages having the same message digest is on the order of 2^64 operations, and that the difficulty of coming up with any message having a given message digest is on the order of 2^128 operations.
We gave up on throttling. It just doesn't work. And not from a technical standpoint, either. If you are going to do throttling, then you need a web server that does real throttling. The only one I know of is Zeus. It does real throttling, letting you limit the total bandwidth for all of a user's sites to a bytes/sec value. No Apache modules do this. Even thttpd doesn't seem to get this right.
I assume that you want to limit bandwidth that your customers use because the bandwidth to your server(s) is limited (i.e. you don't have a 100mbit connect to the internet). In this case, Apache modules will not do what you want. Someone puts up a 10mb file. That file gets downloaded and uses up all of your outgoing bandwidth. While it is being downloaded, the Apache throttle module refuses other requests. This is obviously not what you want.
So suppose you end up using Zeus, or find some other way to do real throttling. Now what do you set the throttle speed to? 5gb over a month averages a little less than 2k/sec. Say you set it higher, like 20k/sec, and there are ten connections downloading that user's files (which can easily happen with certain browsers). What does the average clueless user or webmaster think? They don't understand throttling. They just think that the website is slow and that your service sucks.
We would throttle down user's sites when their bandwidth ran out. Customers did not understand that they had run out of bandwidth, even though they were notified via email. They just thought their sites were slow. We received a lot complaints about that.
We found that the best thing to do is to not throttle and to presell bandwidth cheap. Our different packages come with different amounts of bandwidth, ranging from 5gb to 180gb. After that, customers can purchase extra bandwidth (for $0.50/gb). Customers receive a notification via email when their bandwidth is running low and again when it is completely gone. When their bandwidth is gone, we redirect their sites to a page stating that they used up all their bandwidth.
This solution is simple and it works. Customers always know how much bandwidth they have left and can buy more at anytime. We never have to worry about users running up a huge bill and not paying it, since everything is prepaid.
Actually, it would be kind of cool if this was part of the default install in Gentoo - along with some P2P program for finding others online who are running the same app. Then you could download the source code and distributedly (is that a word) compile it... As long as your network is fast enough, you could significantly reduce the amount of time compiling, etc. on slow machines.
Wonderful. Then you can get rooted when others running the P2P app have modified the compiler to insert trojans into the generated binaries.
Windows NT 3.51(3.1?) through XP share drive C as C$.
Yes, but they are hidden administrative shares. You need an administrator password to access them. The article is about public (i.e. no password) shares.
This guide is excellent:
http://www.ecst.csuchico.edu/~beej/guide/net/
but if you can handle strings in C as easily as in java, please post a link the the libraries you are using. strings suck so much in C, I have to use C++. C++ sucks ass for strings too, so I'm left with java and perl.
http://cr.yp.to/lib/stralloc.html
If a server has to run as root to access a privlidged port (ie http) have your firewall redirect all packets sent to port 80 to port 8080.
This is not necessary. After a server bind()'s a socket to a privleged port and does other necessary tasks (opening log files, etc.) it can drop root privileges using setuid() / setgid(). This is standard practice and almost all servers do this.
The key word is "or". He would not have the license for all of the above, just one of the above.
He is only running one at a time.
Special extra: Proxy support (SOCKS & HTTP)
Try the development snapshot of PuTTY. It has proxy support for both of those.
This is bullshit, and he knows it, but he has to exaggerate and distort the truth in order to highlight his fashionable Bounty idea.
Obviously, you've never had to deal with SPEWS. It is almost impossible to get off their list, regardless of the circumstances.
You block by IP at router level, so even when you store the URLs for reference, you're actually blocking the whole server? What about other content on the server? What about virtual hosts?
If your hosting provider is hosting child porn, then you had better get another provider. It's the same thing as spam blacklists. An ISP that allows spam gets blacklisted. So an ISP either has to prevent spamming or risk losing all non spamming customers.
In other words, can you say if you're doing overbroad blocking? Any innocent bystanders getting -j REJECTed?
That's half point.
(This is how it works. I'm certainly not saying I agree with it.)
How do they know the origin without examining the headers? Headers are part of the content.
The purpose of most blacklists is to stop the spam during the SMTP session. The IP that is connected to the SMTP server is the one that is checked. This doesn't block outputs (secure servers that are relaying for insecure inputs) but not all mail from those should be blocked.
How about just blanking their DNS entry.
And what happens when I am running my own DNS resolver locally, as I do on my cable modem? Or what happens if I know the IP address? Then I can still access it and they aren't blocking it.
Advertisers want to know about the demographics of the people who will be visiting the site. It would be difficult (although not impossible) to develop this information for a honeypot.
In this case, you wouldn't have demographics, but you would know every site that people using the network visit. This is arguably much more valuable to advertisers.
There is no need to do it yourself. Use a network traffic analyzer like Ethereal. The Win32 version works quite well.