Slashdot Mirror


You've Got Mail -- Tons Of It

Daniel Goldman writes "The Baltimore Sun has an article about the City of Baltimore's email problem." A snippet: "Millions of old e-mail messages are clogging Baltimore's municipal computers, so the city is going to start automatically deleting any messages older than 90 days. A common practice in private business, the move raises questions when made by a municipality, which has a responsibility to retain certain public records." Goldman points out "Just think about all the potential law suits; 'if it's not there, they can't subpoena it.'"

26 of 249 comments (clear)

  1. Simple... by Anonymous Coward · · Score: 5, Funny

    Outsource each employees email to GMail. Problem solved.

    1. Re:Simple... by devilspgd · · Score: 4, Insightful

      With a properly designed mail system, only one copy of the message would be stored on disk, with pointers from each mailbox to the single central copy.

      *shrugs*

      --
      Give a man a fish, he'll eat for a day, but teach a man to phish...
  2. Bayesian Filter to Identify Officail Mail by Dave419 · · Score: 5, Interesting

    Since they need to delete tons of old messages spam included, but want to save official email, why don't they train a Bayesian Filter to sort through and save as much as possible. Since they can't rely on their employees actually saving each message which was official to their hard drives.

    --
    ~ there are 10 types of people in this world, those that can read binary and those that can't
    1. Re:Bayesian Filter to Identify Officail Mail by Kenja · · Score: 5, Insightful

      because even one false positive can get them in trouble?

      --

      "Have you ever thought about just turning off the TV, sitting down with your kids, and hitting them?"
    2. Re:Bayesian Filter to Identify Officail Mail by Raven42rac · · Score: 4, Insightful

      Five points for excellent use of buzzwords. I would say compress messages older than 90 days and save them. The government is not supposed to just willy nilly throw things away. I would invest in more hard drive space to hedge against lawsuits.

      --
      I hate sigs.
  3. Why not... by Izago909 · · Score: 4, Insightful

    figure out what percentage is spam, and sue spammers to recover damages for lost resources.

    1. Re:Why not... by AvantLegion · · Score: 4, Funny
      Maybe just figure out what percentage is spam, and delete that percentage of mail. Ehh, that was probably the right 30% to delete....

  4. so? buy some storage, stick them in there. by Anonymous Coward · · Score: 4, Insightful

    can't they, like, just buy a big hard drive and stuff?

    If the average message is 10kib (10,000 bytes, make the math easier), and compresses down to 3kib (probably even better if you compress a bunch together), then you'd need roughly 30gib to store 10 million of them. Can you even buy hard drives that small any more?

    Add some search index, throw a crappy web interface on it, and call it a day. Never delete an email again!

  5. IMHO this sounds perfectly reasonable by Richard_L_James · · Score: 5, Insightful
    You wouldn't expect a public office to hang onto every piece of paper, so why should they be expected to hang onto every email they have ever received?

    There are always going to be things like replies to an original question and subsequent follow up questions going back and forth, so normally hanging onto the latest/final reply would be sufficient (providing it had the previous history - clearly showed the conclusion).

    Now if they were to use this as an excuse to accidently lose records that would be a different matter. This however is where auditors should be playing a role to ensure that they are keeping the right records and discarding the rubbish.

  6. incremental backup by Anonymous Coward · · Score: 5, Insightful

    "Baltimore officials, who approved the new e-mail policy at a Board of Estimates meeting last month, say they have no choice but to delete old messages, which are slowing city computers to a crawl. They say the system is so overburdened that creating a daily backup has become impossible; there is so much data that it takes more than 24 hours to copy it."

    What?!? What's wrong with an incremental backup? Surely all those millions of messages aren't *changing* every day?!?

    Think of all the children that will suffer from this!!!

  7. Blame On-Line Storage by buelba · · Score: 5, Interesting

    There are two technical culprits here:

    1. On-line storage. There's no reason to keep all of everyone's mail on-line on the server (a la IMAP or proprietary MS Exchange) instead of offline on their PC's (a la POP, most often seen with Eudora for non-techies). With offline storage, the servers don't clog, and you can keep as much mail as you like.

    The biggest rap agains off-line storage is that you can't control what people do with their mail or how they store it. My old job had a neat solution for this: Eudora downloaded your mail, but stored it on a file server. Each employee had 100 GB or something very large. It worked great; the SMTP/POP servers were never full, and everyone could keep their email.

    2. Ridiculous stupid bullshit HTML rich-text mail crap. Can you tell I have a bias here? Aside from being annoying, HTML mail can take up to ten times the size of plain old text. Some of the HTML generated by common email programs is just terrible; filled with repeating tags for every line, and just wasting an incredible amount of space for absolutely zero benefit. (Outlook is bad, but there are others that are just as bad.)

    There's no excuse for not fixing these problems. Someday someone's going to tell a court they had to delete mail for these reasons, and someone else is going to explain exactly why they're wrong. Until then, people who want to delete mail for legal reasons will hide behind false technical reasons.

    1. Re:Blame On-Line Storage by Anonymous Coward · · Score: 4, Informative

      1. On-line storage.
      Actually, storing the messages on local computers in an organization is about the worst thing to do. Most/all user computers are not backed up the way the servers are.

      For legal requirements for some organizations, various backups must be maintained. Just because the active mailstore does not maintain messages older then X days in it does not mean that the data is lost forever (and thus, subpoena-able).

      To do this right, first, the City needs to create a policy that establishes that active e-mail messages will not be retained in the "inboxes" more than 30 days. They should also set up mailstores for everyone in a different area on the same or different server (but NOT to user PCs. they need to define a policy against this, also, because user computers can be subpoenae'd, so if a user has been retaining e-mail messages on their own computer, this could undermine the overriding policy, aka "Smoking Gun").

      HTML/Rich-text e-mail messages
      No argument there!

      It is LEGAL to not retain e-mail messages past a reasonable amount of time as long as there is an organization-wide POLICY in place and reasonably applied over the entire organization, but the policy has to be in place first.

      There is lots of information on the net about this already. I would maybe google for "email retention policy"...

  8. wrong approach by yppiz · · Score: 5, Insightful

    This has to be the stupidest approach to the problem. Their networks are too slow, so instead, they're going to have each employee go through their old email and save individually important messages to their local hard disk? Not only are they going to tie up employees with this manual effort, they're also going to lose key documents and a key service - the ability to centrally search and reply to requests for information. In the future, each department will have to search their local hard drives for this information.

    They've taken a simple problem of old or improperly speced equipment and turned it into a manual labor solution instead. That's an insane waste of time and salary. They should just upgrade their network and storage. If I can build a 4 terabyte RAIDed PC for a few thousand dollars, they can centralize their mailserver and back it up for say a hundred thousand, even with extra redundancy and inefficiencies and admin costs.

    By contrast, forcing every current employee to perform a task that would eat up weeks of time per employee per year, in a city of Baltimore's size, will cost tens of millions of dollars.

    Dumb, dumb, dumb.

    --Pat / zippy@cs.brandeis.edu

  9. Temporary Fix by Rie+Beam · · Score: 5, Insightful

    Backup all e-mails from the last 4½ years into permanent storage, and then from there, get organized. Put spam filters on, force people to sort any important mail or else it gets deleted after, say, two weeks. People always seem to want to "start from scratch". without looking at the situation rationally. Five years of documents, gone overnight. How can anyone not be at least outraged by that?

  10. Millons of old spam, most likely. by jafo · · Score: 5, Interesting

    The spam problem is unlikely to go away until people start treating it like the attack on the Internet that it is.

    I've noticed an annoying trend lately that e-mail sent to businesses is frequently getting just ignored. Certainly it seems much more frequent this year than in the past. I've wondered if this is simply because so many e-mail boxes are getting filled up as fast as the spammers can send.

    I'd suspect that the city of Baltimore wouldn't be having any problems if spam weren't such a problem. If the number of messages they had to deal with dropped by 5 to 20 times (depending on which estimates of current spam levels you believe), they could probably just leave the mail where it is.

    This is all something I've been struggling with, being a small business owner doing business on the net. My company of 5 people gets between 4,000 and 20,000 borderline spams per day. By borderline, I mean that we throw away obvious viruses and things which score above a certain score in SpamAssasin (I think it's 9). So, that doesn't count the super spammy messages.

    If it weren't for our fairly strict and complicated spam blocker setup, and a very powerful machine, we couldn't get the few hundred messages per day that are of interest to us. Spam is killing e-mail. I'm not sure why more people aren't treating it as an attack, but it's really hard to get anyone's interest to take some action. Canceling accounts doesn't even begin to solve the problem.

    In the mean time, the City of Baltimore is suffering...

    Sean

  11. Re:Client-side storage is not a good solution by Donny+Smith · · Score: 4, Insightful

    >A better idea would be to write a script to go through each user's mailboxes every month, export any old emails to text, store the files on a server that uses a journaling filesystem, index the emails, and compress them.

    No file system will save you from multiple HDD failures; they should save old (>12 months) data to DVD burners and/or tapes or cheap SATA storage. One can buy 1TB of external SATA space for couple thousand dollars.

    >One or two XServe G5s could do the trick quite well.

    What do XServe boxes have to do with generic application like email? Besides, they're more expensive than comparable Intel+Linux servers (especially considering the fact that CPU perormance is unimportant for most mail servers).

  12. Did they even Look for offline soultions? by Ohm2k · · Score: 5, Informative

    Working at a law firm we have to keep everything for 7 years. We have a system in place that takes all mail over 90 days old pulls it out of exchange and move it to the SAN. As a plus it puts a link back into the information store to make it look like the message is still there. User wants a Old message he can still get it himself w/o a IT person having to do dig up a tame, restore the file and the e-mailing it to him (Thus creating MORE mail). The messages are still searchable and it makes retrival when needed a snap.

    Mind you, we are only a 700 user shop. But nothing gets deleted. If it gets buy the spam filter it gets saved.

    --
    People find it strange that I don't know how to juggle or tap dance.
  13. This issue isn't limited to the City of Baltimore by Flounder · · Score: 4, Interesting
    I work in the IT department of a county close to Baltimore. Our server can retain e-mail indefinitely (there is a space limit per mailbox, but not a time limit). However, our backups only go back 30 days. This is stipulated by the county lawyers.

    As far as I've been able to figure out, this arose from a lawsuit against the county where an e-mail retrived from two years previous proved a county commissioner to be taking bribes in a zoning issue.

    Rather than fix the corruption, just ensure that it's covered up more efficiently. Gotta love local governments.

    --

    No boom today. Boom tomorrow. There's always a boom tomorrow. - Cmdr. Susan Ivanova

  14. A mark or procedure for official business by jhines · · Score: 4, Insightful

    Once an actual human person has read and acted on the mail, they should be able to mark it "official business" and/or move the email into an "official business" folder which does get kept as required.

    Better procedures and training goes a long way here. These same folks have no problems with snail mail.

  15. Re:Great way to ignore your customers by No.+24601 · · Score: 4, Insightful
    If they haven't read it in 90 days, they've already ignored it.

    I don't know what business you work in, but if they haven't read it in 3 days, they've lost my business.

  16. Removing old messages isn't the best option by crimoid · · Score: 5, Insightful

    A better option would be to archive old messages rather than remove them entirely. From the article it sounds like they are keeping ALL messages active all the time. For example:

    "They say the system is so overburdened that creating a daily backup has become impossible; there is so much data that it takes more than 24 hours to copy it."

    So, it seems like the solution would be to periodically lop off old messages to offline storage (tape, spare drives, whatever). In the event of a lawsuit the old messages could be reasonably recovered and the cost for such a system would be extremely minimal.

  17. Complying with Public Records Acts by EconomyGuy · · Score: 4, Insightful

    Unlike a legal office where communications are governed by extensive regulation, governments are really only required to keep records of official documents and decisions. The myriad of e-mails leading up to a decision are not generally protected under such an act, nor are snail mail or phone conversations. In fact, the whole idea of there being a digital trail to follow for governmental decision making is really very new. Does it makes sense to change that practice? Do we really think our government officials should be so closely watched that EVERY e-mail/phone conversation/smoke signal should be recorded and exposed to public scrutiny? Talk about making an unattractive job even less inticing.

    In responce to the posters question about all those subpoenas: welcome to the world of civil litigation, where the first one to destroy the evidence wins!

    --
    Only 120 characters... who can summarize their entire world understanding in 120 characters?!
  18. Re:HTML in email? Forbidden! by Hatta · · Score: 4, Insightful

    I'll handle these in reverse order.

    Word attachments are acceptable when they are just a means of moving files around, and not the entire content of the email. What is not acceptable is expecting me to load a large word processor just so you can use the company letterhead. In my experience the latter type is far more common. Besides the security implications (macro viruses, etc), I do not have a gui on the computer I read my email. Nor should I need one.

    As for HTML email, I'm simply not going to render strange IMG tags. They could lead to goatse, or back to a spammer's site, and now they know my email is active. HTML email generally looks like it was designed by an 8 year old with downs syndrome anyway. Plain text is just more readable for nearly every email. Check out HTML email is STILL evil!!! for more.

    --
    Give me Classic Slashdot or give me death!
  19. Saving to local drives? by Gonoff · · Score: 4, Informative

    We have to spend a lot of time telling people to **NOT** save to local drives. If it is important or confidential, or may be in the future, this should not be saved locally unless you want to loose it or explain to an enquiry why it was found on sale in a car boot sale after a break in. This is what a network is for.

    The answer to the problem in the article is quotas. *nix has them, Novell has them and even Windows has them. Our email quota works as follows
    Limit 1 - email user once per day marked high importance that they are getting close.
    Limit 2 - disable sending and continue with (2k) warning message.
    Limit 3 - disable receiving apart from one final message saying that it would all start working again when the user clears some space

    When they can't send/receive, they get a dialogue box reminding them when they try and when they can't receive, the sender gets a messge.

    This does make for support calls like...

    "Why does my computer tell me that the email is full up and I can't send any more?"
    "Because your email is full up. You have a message explaining this to you."

    "X tried to send me an email and it bounced saying that my mailbox was full up. Why?"
    "Because your mailbox is full up."

    --
    I'll see your Constitution and raise you a Queen.
  20. Information Lifecycle Management by bolix · · Score: 4, Insightful

    ILM is the next big thing. Its the logical extension to the ever increasing SAN/NAS Server/Workstation exponentially-increasing-data problem (go google for pretenders to the law).

    You can't oversee growing data storage without a parallel increase in administration costs. Instead, the idea is to build automatic archiving into your storage architecture.

    In practice this means you build tiers of storage/archive methods. Tier 1 is a high tkt Shark SAN etc, Tier 2 is lower priced SATA RAID and Tier 3 is a DAS Tape Library. Build retention guidelines into the storage management playform (Tivoli etc). Older items are automatically moved to the Tier corresponding to that retention/access policy. Really old items "live" on Tape. Frequently accessed data lives on the high speed boxes near to the users/application. You snapshot updates to a DR replica offsite or burn periodic Tape sets etc. Its a good idea to team this with storage virtualization (virtual LUNS/ Metadata directory servers) and you can add/rotate/modify the storage tiers when necessary without any downtime.

    From a user perspective, you click on the link and if applicable, get notified the item is being retrieved from media x (its mostly transparent). Worse case - access times are in the minutes.

    Of course, all this comes with a high price. Enterprise Storage systems are not cheap. Recent legislated policy (Sarbanes Oxley etc) enforces the retention of some media (e.g. email). You cannot rely on end users to enforce data retention. This lets you mandate tiers of protection and is highly configurable to support per application monitoring.

    Nothing is foolproof. Its still being finessed but if you can afford it - its truly a thing of beauty.

  21. Re:Simple... (not) by Zen · · Score: 4, Informative

    There comes a point where that, too, gets very expensive. At my company (large US healthcare provider, with governmental and private contracts both HMO and PPO), after saying 3, 5, and 7 years, our lawyers have told us we have to archive all email potentially forever that the end user doesn't specifically delete. They may do an end-run around the deletion and archive those, too, but I don't know. Anyway, our email system (Lotus Notes, which is an extreme HOG) eats somewhere between 100GB - 1TB/week. I was told it was well over 1TB, but I don't believe them. This is of course due to older Notes versions inability to store attachments in public directories and simply sending a copy to each and every recipient (and the stupidity of no size limits on internal email). There is a point to how many drives you can add to a SAN, and then you have to get a whole extra chassis, which is where the expensive part comes in. To keep buying new SAN units every 6 months or so, as well as the harddrives to put in them (plus the maintenance contracts, 24/7 support, etc) could easily add up to $1million/year or more. Which is definitely more costly than 10 average low-mid level administrator's salaries.