You've Got Mail -- Tons Of It
Daniel Goldman writes "The Baltimore Sun has an article about the City of Baltimore's email problem." A snippet: "Millions of old e-mail messages are clogging Baltimore's municipal computers, so the city is going to start automatically deleting any messages older than 90 days.
A common practice in private business, the move raises questions when made by a municipality, which has a responsibility to retain certain public records." Goldman points out "Just think about all the potential law suits; 'if it's not there, they can't subpoena it.'"
1. On-line storage.
Actually, storing the messages on local computers in an organization is about the worst thing to do. Most/all user computers are not backed up the way the servers are.
For legal requirements for some organizations, various backups must be maintained. Just because the active mailstore does not maintain messages older then X days in it does not mean that the data is lost forever (and thus, subpoena-able).
To do this right, first, the City needs to create a policy that establishes that active e-mail messages will not be retained in the "inboxes" more than 30 days. They should also set up mailstores for everyone in a different area on the same or different server (but NOT to user PCs. they need to define a policy against this, also, because user computers can be subpoenae'd, so if a user has been retaining e-mail messages on their own computer, this could undermine the overriding policy, aka "Smoking Gun").
HTML/Rich-text e-mail messages
No argument there!
It is LEGAL to not retain e-mail messages past a reasonable amount of time as long as there is an organization-wide POLICY in place and reasonably applied over the entire organization, but the policy has to be in place first.
There is lots of information on the net about this already. I would maybe google for "email retention policy"...
In Texas, email to any state or local public official (either elected or appointed), and certain categories of state and local government employees constitutes both a "public meeting" and "public record". The state's record retention laws say that the email must be kept a minimum of three years, and depending upon content up to 7 or 10 years, or perhaps even forever. If an email is deleted prematurely, then state law provides various levels of punishments for different degrees of tampering with, or destruction of public records, which can be as severe as state jail felony hard time if you've destroyed any email that could be construed as evidence in any criminal court case.
Though I don't work in the auditors office in my state, here is what they implemented. Any document (digital or not) over 30 days must be made public. Solution, any e-mail over 30 days is deleted. It allows them to not worry about keeping all e-mail till the end-of-time and not worry about making e-mail public. Great solution in that scenario.
Working at a law firm we have to keep everything for 7 years. We have a system in place that takes all mail over 90 days old pulls it out of exchange and move it to the SAN. As a plus it puts a link back into the information store to make it look like the message is still there. User wants a Old message he can still get it himself w/o a IT person having to do dig up a tame, restore the file and the e-mailing it to him (Thus creating MORE mail). The messages are still searchable and it makes retrival when needed a snap.
Mind you, we are only a 700 user shop. But nothing gets deleted. If it gets buy the spam filter it gets saved.
People find it strange that I don't know how to juggle or tap dance.
Check out project Cyrus. I haven't used it for large projects, but I notice it does support distributing mailboxes across multiple backend servers (The Murder stuff).
Try CommunigatePro: it's not open source but extremely reliable and flexible and can handle huge volumes of mail, and can do clustering over multiple servers too. I've had experience with it on couple of large-scale installations, 50K accounts+ with millions of msgs per day.
We have to spend a lot of time telling people to **NOT** save to local drives. If it is important or confidential, or may be in the future, this should not be saved locally unless you want to loose it or explain to an enquiry why it was found on sale in a car boot sale after a break in. This is what a network is for.
The answer to the problem in the article is quotas. *nix has them, Novell has them and even Windows has them. Our email quota works as follows
Limit 1 - email user once per day marked high importance that they are getting close.
Limit 2 - disable sending and continue with (2k) warning message.
Limit 3 - disable receiving apart from one final message saying that it would all start working again when the user clears some space
When they can't send/receive, they get a dialogue box reminding them when they try and when they can't receive, the sender gets a messge.
This does make for support calls like...
"Why does my computer tell me that the email is full up and I can't send any more?"
"Because your email is full up. You have a message explaining this to you."
"X tried to send me an email and it bounced saying that my mailbox was full up. Why?"
"Because your mailbox is full up."
I'll see your Constitution and raise you a Queen.
There comes a point where that, too, gets very expensive. At my company (large US healthcare provider, with governmental and private contracts both HMO and PPO), after saying 3, 5, and 7 years, our lawyers have told us we have to archive all email potentially forever that the end user doesn't specifically delete. They may do an end-run around the deletion and archive those, too, but I don't know. Anyway, our email system (Lotus Notes, which is an extreme HOG) eats somewhere between 100GB - 1TB/week. I was told it was well over 1TB, but I don't believe them. This is of course due to older Notes versions inability to store attachments in public directories and simply sending a copy to each and every recipient (and the stupidity of no size limits on internal email). There is a point to how many drives you can add to a SAN, and then you have to get a whole extra chassis, which is where the expensive part comes in. To keep buying new SAN units every 6 months or so, as well as the harddrives to put in them (plus the maintenance contracts, 24/7 support, etc) could easily add up to $1million/year or more. Which is definitely more costly than 10 average low-mid level administrator's salaries.
only if your mta sucks anus. with a rational one, it'll recognize that the same email is going to 3000 people and only store one copy with a reference count of 3000. when they've all deleted it, it'll get punted, until then it'll use 2MB.