How are Your SMTP Timeouts Configured?
Asprin asks: "One of the employees at work had a major headache because a very important email was undeliverable for more than 24 hours. Sure, he got an warning from our server about it, but only after an entire day had passed, and the email was no longer timely. Therefore, I shorted the message handling timeouts to send 'delivery delayed' warnings after 30 minutes and to cancel the message as undeliverable after four hours. Now, I don't expect any of the other mail administrators here to bless these timeouts because they're way too short. HOWEVER, the truth is that my users rely on email to be as reliable as telephone messages, and if it can't be delivered immediately, it is better to reject it outright and alert the user so that other communication channels can be exploited such as fax or Fed-Ex. Is anyone else doing this? Are there any non-obvious ramifications lurking? Pros? Cons? Comments? Should we all reduce these timeouts on our SMTP servers?"
That said, even if your e-mail server doesn't send you the outage, that doesn't mean the e-mail actually got there. It could have been received by a secondary MX, not the primary one that delivers it.
I'm sure everybody and their brother will mention that read receipts, and receive receipts are a good idea in this case (even those are reliable, but it's better then nothing). Oh, and that if the message was this important, at the very least a confirmation call. You might look like a character out of a Dilbert strip, but it sounds like confirmation would have been worth the embarrassment in this case.
Kirby
When I was a kid, I used to think that a 2400 modem was really fast. You could download a 300KB game in a few minutes. And I could store dozens of them on my 20MB hard drive.
When I hear newbies complaining about their slow 300KB/s connections and too small 100GB storage units, I feel anger inside. They just can't appreciate the value of technology.
When E-mail was introduced 30 years ago, it was an amazing feat: you could send messages across the country in less time than regular postal service. Wow.
Now we're complaining about limitations of a 30-year old technology that works as intended. Come on. It's still amazing. There are IMs, IRC channels all over the place for these "urgent needs".
Don't blame the hammer if it doesn't mow your lawn.
For God's sake, yes!
Problem number one: For the most part, email is perfectly reliable. If it isn't delivered half an hour, 99% of the time it's because I screwed up the address. I'd like to know after 5 minutes, but I'd take half an hour. And I don't want the computer trying for four freaking days to send an email that I messed up.
Problem number two: Let's say there was a legitimate problem with the network. A router was taken down for maintenance, for instance. These days, people grumble if it's down for more than 10 minutes, and few outages last more than a couple hours. For the re-try interval, 12 hours is probably sufficient, but 24 will cover an overnight outage and its subsequent fix with time to spare. Heck -- How many outages last for more than a day? In the rare event that it does, it may last a week, or maybe a permanent change occurred to keep the mail from ever being deliverable.
So, I have no advice to you other than please, please make everyone you know configure their system as such.
(Flames -- err, I mean opposing viewpoints -- welcome.)
We use dogsleds.
If the dogs come back wanting food without the sled.
Then the driver was eaten by a bear and the
message did not get through and the sled
sunk below the ice.
Resend Message
else If The dogs come back with the sled but
without the driver and w/o reply
Then the driver was eaten by a bear and the
the dogs were hungry so they came home.
Resend Message
else If The dogs come back with the sled and
the reply but without the driver.
Then the team made it to the destination
got the reply but the driver was eaten
by a bear on the way home. However,
the dogs were hungry so they returned.
else If the driver returns with the team and
the reply.
Then the reply is a fake. The driver hung out
at the brothel down the road for a few
weeks and faked the reply.
Resend message.
134340: I am not a number. I am a free planet!
Uhm... NO!!!
The timeouts are there to handle cases where a remote server is off the net for whatever reason. While I can see shortening the warning message, your not helping yourself if you shorten the period of time that the server attempts to deliver for.
Sendmail (I'm not sure what MTA is being used for this example, but I would hope that would be irrelevant) can handle multiple queue times based on the priority of the message. With this you could have the high priority mail fail in 12 hours while normal mail takes a normal amount of time.
When mail runs great, its smooth and very timely. But when it breaks, it can go down hard.
In my experience, the recovery of a mail server isn't what takes the most time, its the ammount of time it takes to process the backlog from other servers queues.
If you run at 50% capacity, then basically, an outage of 2 hours will take you 2 hours more to get caught up, and thats assumeing that your server is running optimally. Best way to find bottlenecks in your mail servers is to shut it down for an hour and see what stops it from working (syslog is great for doing this if you have unbuffered logging).
Timeouts are there to help the system recover when something goes wrong. Use the priorities to change the timeouts, but dropping mail too quickly is just doing everyone a disservice.
Personal experience. I saw a system setup to drop all messages that were not delivered in 45 minutes. I was floored when I saw this. They had a problem with the machine and their system took almost a week to stabalize and catch up (underpowered systems running too many opposing services. DNS on the mail servers is not good since when you do alot of mail, your lookups steal CPU from your mail servers and the problem gets amplified when you processing the backlog)
- confTO_QUEUERETURN_NORMAL
- confTO_QUEUERETURN_URGENT
- confTO_QUEUERETURN_NONURGENT
and the same again with "WARN" in the place of "RETURN". The best thing is, if you set these in a certain way it *really* causes grief for those pricks that like to set the urgent flag on all their emails because they get innundated with warnings. It's like a LART, only without the lawsuit.UNIX? They're not even circumcised! Savages!
After the Sobig virus so of our email were taking 3 hours to get through and alot of our users were asking us why it took so long to send an email to someone that was less than 20 meters away (our ISP still does our email as I haven't had time to set one up in house). After getting close to 20 people asking me the same question I sent out an email giving everyone a quick idea of what happens under the hood and how it was a miricle that they got email at all.
It went something like this (short version):
When you click on send the message is sent to our ISP. Our ISP then sends it to another ISP (our old ISP that till host our mail)which then sends it back to us. At each ISP it goes into a que with 1,000 of other messages. For your email to get from you to the person 20 meters from you it has to travel 6000+ km (Australia is a big country and our current ISP is in perth) and it normally does this in less than 5 minutes.
Also there are currently two viruses on the internet that have slowed down the entire internet: SoBig and Slammer.
After I sent the email out and explained how email worked and why everything was so slow lots of the users told me that they never new so much happened in the background. I haven't had anyone complain about email again.