You've Got Mail -- Tons Of It
Daniel Goldman writes "The Baltimore Sun has an article about the City of Baltimore's email problem." A snippet: "Millions of old e-mail messages are clogging Baltimore's municipal computers, so the city is going to start automatically deleting any messages older than 90 days.
A common practice in private business, the move raises questions when made by a municipality, which has a responsibility to retain certain public records." Goldman points out "Just think about all the potential law suits; 'if it's not there, they can't subpoena it.'"
Since they need to delete tons of old messages spam included, but want to save official email, why don't they train a Bayesian Filter to sort through and save as much as possible. Since they can't rely on their employees actually saving each message which was official to their hard drives.
~ there are 10 types of people in this world, those that can read binary and those that can't
As public employees working on publicly owned systems, this sounds like a job for a Google appliance. Of course, the existing email would likely have to be processed in some fashion--possibly manually.
Henceforth, though, any email correspondence, unless explicitly marked as "personal" or "sensitive" should be placed in an archive searchable by the public. The "sensitive" messages should be placed in another archive not open to searching by the public. How "personal" messages are handled, and what constitutes "sensitive" should be based on policy established by legislation (of course, open to public debate).
There are two technical culprits here:
1. On-line storage. There's no reason to keep all of everyone's mail on-line on the server (a la IMAP or proprietary MS Exchange) instead of offline on their PC's (a la POP, most often seen with Eudora for non-techies). With offline storage, the servers don't clog, and you can keep as much mail as you like.
The biggest rap agains off-line storage is that you can't control what people do with their mail or how they store it. My old job had a neat solution for this: Eudora downloaded your mail, but stored it on a file server. Each employee had 100 GB or something very large. It worked great; the SMTP/POP servers were never full, and everyone could keep their email.
2. Ridiculous stupid bullshit HTML rich-text mail crap. Can you tell I have a bias here? Aside from being annoying, HTML mail can take up to ten times the size of plain old text. Some of the HTML generated by common email programs is just terrible; filled with repeating tags for every line, and just wasting an incredible amount of space for absolutely zero benefit. (Outlook is bad, but there are others that are just as bad.)
There's no excuse for not fixing these problems. Someday someone's going to tell a court they had to delete mail for these reasons, and someone else is going to explain exactly why they're wrong. Until then, people who want to delete mail for legal reasons will hide behind false technical reasons.
The spam problem is unlikely to go away until people start treating it like the attack on the Internet that it is.
I've noticed an annoying trend lately that e-mail sent to businesses is frequently getting just ignored. Certainly it seems much more frequent this year than in the past. I've wondered if this is simply because so many e-mail boxes are getting filled up as fast as the spammers can send.
I'd suspect that the city of Baltimore wouldn't be having any problems if spam weren't such a problem. If the number of messages they had to deal with dropped by 5 to 20 times (depending on which estimates of current spam levels you believe), they could probably just leave the mail where it is.
This is all something I've been struggling with, being a small business owner doing business on the net. My company of 5 people gets between 4,000 and 20,000 borderline spams per day. By borderline, I mean that we throw away obvious viruses and things which score above a certain score in SpamAssasin (I think it's 9). So, that doesn't count the super spammy messages.
If it weren't for our fairly strict and complicated spam blocker setup, and a very powerful machine, we couldn't get the few hundred messages per day that are of interest to us. Spam is killing e-mail. I'm not sure why more people aren't treating it as an attack, but it's really hard to get anyone's interest to take some action. Canceling accounts doesn't even begin to solve the problem.
In the mean time, the City of Baltimore is suffering...
Sean
to dump it off to tape and then just store the tapes instead of just deleting it. Though they are probably running an Exchange server so offloading data stores wouldn't be the easiest thing to do. If they were using something with a simple mbox store, they could easily just parse it through a date filter and dump the older than 90 day stuff to tape. At least then it could be retrieved at a later date.
Oh, wait, let me guess, they aren't using tape backups...
Don't Ask Questions. I don't know the answers and even if I did I wouldn't tell you.
OMFG, we nearly had a lynch mob attack us when we began deleting mail older than *two years* -- it eventually took the intervention of the CFO and a faked mail system "crash" to make 2-year max retention work, and even then there are people still pissed about it, or who claim that "the client" requires them to retain all correspondence (nope, sorry, we checked the contract).
.PST files, which often max out at 2 gig and can get corrupted way too easily, not to mention being fdisked into eternity by clueless helpdesk people.
90 days seems both unrealistic to implement and way too much reliance on
We don't want someone to be able to request something from backups that the user thinks is gone.
This way it's up to the user to decide if they want their data archived. And the onus is on the user to comply with however long the data is supposed to be kept before being destroyed.
There's another problem with keeping so much e-mail.
When the agency *DOES* get sued (and it will sometime in the future, everyone gets sued), those users will have to sort thru *ALL* Those messages. Who's going to decide what's relevant and what's not?
Who's going to decide what's attorney/client privelage? I don't think any public agency can have a bunch of *lawyers* go thru 100,000's of e-mails.
What happens the next time they're sued? They have to go thru them again!
The problem is that with most government agencies (and I work for one), is that the records retention policy/schedule doesn't specifically address e-mail, and never envisioned the mass quaintity that it could generate.
Our agency as in the process of adopting a similar plan. If the message is relevant, it gets printed and put in the paper file. If there's an electronic folder for the project, it gets saved to disk there (and would be alot easier to turn over in case of a lawsuit). All e-mail is deleted off server after 90 days.
As far as I've been able to figure out, this arose from a lawsuit against the county where an e-mail retrived from two years previous proved a county commissioner to be taking bribes in a zoning issue.
Rather than fix the corruption, just ensure that it's covered up more efficiently. Gotta love local governments.
No boom today. Boom tomorrow. There's always a boom tomorrow. - Cmdr. Susan Ivanova
what about cds as archival media?
something about breaking down, but is that real?
then there's dvds and magneto-optical (my personal favourite)
Just dump the old email to DVD-R and archive it somewhere. If someone wants to subpoena it, burn off copies and wish 'em luck. Even if the city is getting a million pieces of spam a day, at 5kb each after data compression, that's just one DVD-R per day at a buck or so each, peanuts compared to what the city already must spend xeroxing memos for records retention purposes.
Any of these so called "important governement documents" shouldn't be stores in an email archive anyway. They should be on a network drive getting backed up.
My point is that a better solution is to put the email storage in the end user's hands. Set file size limits on their accounts and have them move all important mail off of their server mailbox and into a Microsoft PST file...aka Personal folder.
I work for Fortune 50 company, running in an exchange environment and this is the method we use for about 4000 corporate employees. They have 10 MB mailbox limitations that will not allow them to send any email when their account reaches 10 MB. We then shut the accounts off when they reach 50 MB and kick messages back to the sender.
Users who have important email setup Personal folders in Outlook and move messages from their inbox to their PST file. This file is stored locally on laptops (for travel purposes) and on the user's network home drive for desktop PCs.
We run standard incremental backups daily and full backups once a week. The only problem we have with this is that MS's PST files have a 1 gb limitation before they get corrupt so some of the legal and credit employees have three or four personal folders normally sorted by year. So you would have one file for 2004, one for 2003, etc etc. Works for us, has to work for the government.
http://jayceecorder.blogspot.com
At one company that I worked for, they got the brilliant idea to delete all email older than 30 days. They also didn't want employees to make backups of their personal mailboxes. They intentionally wanted all traces of old email to disappear. While I'm sure that it made the lawyers happy, it caused a lot of grief for the people actually doing work for the customer. Many design decisions, bug reports and other important things were only documented in email messages. This is supposed to be the age of the paperless office, right? When you are involved in a multi-year project, you often need to refer to old messages. It also had the effect of making old policy memos disappear, whose existence had proved to be very inconvenient to management on several notable occasions.
Mea navis aericumbens anguillis abundat
Right. Our company originally tried to instate size limits when we went to Notes (only 3 years ago), but then the lawyers said we need to keep everything anyway (HIPAA requirements). So even with the exorbitant expense of the system, it is probably still cheaper to keep expanding every couple months rather than pay people to sit there and sort through their own email. Anything from an external party must be kept, and anything remotely regarding a customer must be kept as well. It's a huge pain, and they took the easy way out by archiving every single email. But neither option is very cost effective. There are four people that I know of in my department alone that have email boxes (extensively categorized with dozens or up to hundreds of folders) with up to 20GB each. It's crazy. But even without the ever looming threat of a lawsuit, they claim that they have been able to disprove what other people were badmouthing them about by being able to produce an email from that person stating the exact opposite a year or two previously. I've witnessed it once, and it is pretty funny watching somebody turn beet red in a room with 25 supervisor's and above.
Probably too late in the thread for this to catch much attention...
However, I work for a local Government office, close to Charlotte, NC. It is our stated policy to remove all e-mail older than two weeks except for e-mail that is crucial to job performance. This is less to save space then it is to keep the news media from finding dirt. We really don't care that our e-mail is public record. We really have nothing to hide. However, the local newspaper (in Charlotte) is constantly asking for _ALL_ of the County Manager's e-mail. They aren't looking for anything specific, they are just on a fishing trip, trying to see what trouble they can stir up. They rely on the Freedom of Information act to, hopefully, generate some news, instead of doing some real investigative work. *sigh*
Anonymity enabled for self-protection. Wouldn't want the powers that be see my e-mail...