Slashdot Mirror


You've Got Mail -- Tons Of It

Daniel Goldman writes "The Baltimore Sun has an article about the City of Baltimore's email problem." A snippet: "Millions of old e-mail messages are clogging Baltimore's municipal computers, so the city is going to start automatically deleting any messages older than 90 days. A common practice in private business, the move raises questions when made by a municipality, which has a responsibility to retain certain public records." Goldman points out "Just think about all the potential law suits; 'if it's not there, they can't subpoena it.'"

37 of 249 comments (clear)

  1. Beowulf cluster by b0lt · · Score: 2, Insightful

    This might be a practical use of one, determine which emails are valid, and which aren't, like a spam filter. Allow users to flag 25% or so emails as important, and archive those.

    --
    got sig?
  2. Fun! by Anonymous Coward · · Score: 1, Insightful

    Now we get to subpoena entire hard drives so we can run data-recovery software on them. It would be smart of any operation, public or private, to wait out the statute of limitations (which I realize may vary) of any states with which they have substantial contacts before they start deleting data.

  3. Why not... by Izago909 · · Score: 4, Insightful

    figure out what percentage is spam, and sue spammers to recover damages for lost resources.

  4. so? buy some storage, stick them in there. by Anonymous Coward · · Score: 4, Insightful

    can't they, like, just buy a big hard drive and stuff?

    If the average message is 10kib (10,000 bytes, make the math easier), and compresses down to 3kib (probably even better if you compress a bunch together), then you'd need roughly 30gib to store 10 million of them. Can you even buy hard drives that small any more?

    Add some search index, throw a crappy web interface on it, and call it a day. Never delete an email again!

    1. Re:so? buy some storage, stick them in there. by ThisIsFred · · Score: 3, Insightful
      can't they, like, just buy a big hard drive and stuff?
      Here's the problem with this: The longer the stuff is retained, the more expensive it gets to hold on to it. IT is usually a very low budget priority to government agencies, so it's going to be hard to purchase high-reliabilitly mass storage devices every couple of years. Since the goal is permanent archival, cheap, high-cap ATAPI fixed disks are going to be the last thing you want to store the stuff on. The other issue is that the user of the mailbox has complete control over the contents, so retaining everything is going to be really difficult to do, and accidental deletion will be a very credible alibi.

      There are rumblings about FOI and permanent archival among my Governmental Overlords, so I'm thinking hard about potential solutions to the problem. Trust me, it's very complicated issue, more so than I care to illustrate here (especially considering my habit of rambling on).

      The simplest solution is responsibility. If it's official policy, it's on dead-trees and filed away.
      --
      Fred

      "A fool and his freedom are soon parted"
      -RMS
    2. Re:so? buy some storage, stick them in there. by wired_parrot · · Score: 3, Insightful

      The problem is, the average e-mail is not necessarily 10kb. While HTML can be part of the problem by making e-mails several times bigger than need be, my experience is that large attachments are generally the biggest culprits. A 20Mb powerpoint presentation sent by pointy-haired manager to all his minions can easily swamp the system. And trust me, there are plenty of clueless managers out there sending out Very Large Attachments. I've received 50Mb Excel spreadsheet once, which contained nothing but a single image of a chart scanned at a ridiculously high resolution. It's crap like this that swamps mail servers, not the two paragraph responses.

  5. IMHO this sounds perfectly reasonable by Richard_L_James · · Score: 5, Insightful
    You wouldn't expect a public office to hang onto every piece of paper, so why should they be expected to hang onto every email they have ever received?

    There are always going to be things like replies to an original question and subsequent follow up questions going back and forth, so normally hanging onto the latest/final reply would be sufficient (providing it had the previous history - clearly showed the conclusion).

    Now if they were to use this as an excuse to accidently lose records that would be a different matter. This however is where auditors should be playing a role to ensure that they are keeping the right records and discarding the rubbish.

  6. incremental backup by Anonymous Coward · · Score: 5, Insightful

    "Baltimore officials, who approved the new e-mail policy at a Board of Estimates meeting last month, say they have no choice but to delete old messages, which are slowing city computers to a crawl. They say the system is so overburdened that creating a daily backup has become impossible; there is so much data that it takes more than 24 hours to copy it."

    What?!? What's wrong with an incremental backup? Surely all those millions of messages aren't *changing* every day?!?

    Think of all the children that will suffer from this!!!

    1. Re:incremental backup by Piquan · · Score: 3, Insightful

      What?!? What's wrong with an incremental backup? Surely all those millions of messages aren't *changing* every day?!?

      That depends on how their email system works. If it stores each user in a single file, then that file is changing every day. If they're using a file-based backup system...

    2. Re:incremental backup by HermanAB · · Score: 2, Insightful

      Sure, but saving the user's copies of the e-mail is friggen retarded. It is much easier to store the incoming and outgoing streams of e-mail on the server and use logrotate to create a new file every week, or every day even. As soon logrotate moves a file to backup, it won't change anymore - ever.
      With Postfix, use always_bcc to forward all outgoing mail to a user called outlog, then use procmail to save all outgoing mail to a log file.
      Likewise, procmail can save all incoming mail - after the crap filters - to an incoming log file called inlog.
      Finally, use logrotate to archive the logs periodically.
      Then you don't have to worry about the users and they can do with their copies of the mail whatever they damnwell please.

      --
      Oh well, what the hell...
  7. wrong approach by yppiz · · Score: 5, Insightful

    This has to be the stupidest approach to the problem. Their networks are too slow, so instead, they're going to have each employee go through their old email and save individually important messages to their local hard disk? Not only are they going to tie up employees with this manual effort, they're also going to lose key documents and a key service - the ability to centrally search and reply to requests for information. In the future, each department will have to search their local hard drives for this information.

    They've taken a simple problem of old or improperly speced equipment and turned it into a manual labor solution instead. That's an insane waste of time and salary. They should just upgrade their network and storage. If I can build a 4 terabyte RAIDed PC for a few thousand dollars, they can centralize their mailserver and back it up for say a hundred thousand, even with extra redundancy and inefficiencies and admin costs.

    By contrast, forcing every current employee to perform a task that would eat up weeks of time per employee per year, in a city of Baltimore's size, will cost tens of millions of dollars.

    Dumb, dumb, dumb.

    --Pat / zippy@cs.brandeis.edu

  8. Temporary Fix by Rie+Beam · · Score: 5, Insightful

    Backup all e-mails from the last 4½ years into permanent storage, and then from there, get organized. Put spam filters on, force people to sort any important mail or else it gets deleted after, say, two weeks. People always seem to want to "start from scratch". without looking at the situation rationally. Five years of documents, gone overnight. How can anyone not be at least outraged by that?

  9. Re:Bayesian Filter to Identify Officail Mail by Kenja · · Score: 5, Insightful

    because even one false positive can get them in trouble?

    --

    "Have you ever thought about just turning off the TV, sitting down with your kids, and hitting them?"
  10. Re:Bayesian Filter to Identify Officail Mail by Anonymous Coward · · Score: 1, Insightful

    Deleting all the mail... or delete a few false positives. Hrm, tough call...

  11. Re:Bayesian Filter to Identify Officail Mail by llamaguy · · Score: 2, Insightful

    You could have a manual meta-check on all the positives to make sure they aren't anything vital. Computers don't make mistakes, they just don't think like we do so sometimes it's necessary to sort through it all.

    --
    HAH! I just wasted a second of your life making you read this, but I wasted a minute of mine thinking it up. DAMN.
  12. Re:Bayesian Filter to Identify Officail Mail by abhisarda · · Score: 2, Insightful

    "because even one false positive can get them in trouble?"

    You should probably go take a class on probability. When you're dealing with millions of email, there are going to be some false positives.
    What's the alternative, hand sort them?
    Yeah, that's a good idea right? But with bayesian filtering, you can do a lot of refining when you're dealing with millions of email.
    And who says that you need to use the same filters for the health dept and the transport dept.

    Jesus christ, there are lots of companies that already do this. Its not like Baltimore's the only city with millions of old email.
    This is not a mars mission, this is judicious use of existing technology- bayesian filters(or whatever fits the profile) and enterprise storage solutions.
    Its better off spending a few hundred thousand(or less) on archiving the mails than spend a million or two on lawyers and court 5 years later defending the decision to delete the data after some citizen sues them for records etc.

  13. Problem with email by sydbarrett74 · · Score: 2, Insightful

    This highlights a fundamental problem with email -- many people pass documents as attachments, or in the body of the email, instead of using email as a sort of metadata describing their works in progress. Documents shouldn't be passed around in email; they should be stored on a network share, where proper controls for mutual exclusion and such can be employed.

    --
    'He who has to break a thing to find out what it is, has left the path of wisdom.' -- Gandalf to Saruman
  14. Email == offical documents ? by drgonzo59 · · Score: 2, Insightful

    I am not sure if they can use email as official communication? There would be problems with repudiation ("we never received it"), privacy ("someone intercepted it who was not supposed to") and authentication ("it wasn't me who sent it, it was my dog"). Can they use an email in the court then? What would have to be done is to have all the messages signed and encrypted with a public key, and perhaps have some way for the sender to get a receipt back when reciever reads the message.

  15. Re:Simple... by Simon+Lyngshede · · Score: 3, Insightful

    Gmail wouldn't solve problems like this, they only offer one 1GB. I work with secretaries who would use up 1GB of storage a year if they didn't delete any emails. The organisation I do systems administration for isn't even that big, so I could easily imagine that other people running into problems earlier.

  16. Re:Client-side storage is not a good solution by Donny+Smith · · Score: 4, Insightful

    >A better idea would be to write a script to go through each user's mailboxes every month, export any old emails to text, store the files on a server that uses a journaling filesystem, index the emails, and compress them.

    No file system will save you from multiple HDD failures; they should save old (>12 months) data to DVD burners and/or tapes or cheap SATA storage. One can buy 1TB of external SATA space for couple thousand dollars.

    >One or two XServe G5s could do the trick quite well.

    What do XServe boxes have to do with generic application like email? Besides, they're more expensive than comparable Intel+Linux servers (especially considering the fact that CPU perormance is unimportant for most mail servers).

  17. A mark or procedure for official business by jhines · · Score: 4, Insightful

    Once an actual human person has read and acted on the mail, they should be able to mark it "official business" and/or move the email into an "official business" folder which does get kept as required.

    Better procedures and training goes a long way here. These same folks have no problems with snail mail.

  18. Re:Great way to ignore your customers by No.+24601 · · Score: 4, Insightful
    If they haven't read it in 90 days, they've already ignored it.

    I don't know what business you work in, but if they haven't read it in 3 days, they've lost my business.

  19. Removing old messages isn't the best option by crimoid · · Score: 5, Insightful

    A better option would be to archive old messages rather than remove them entirely. From the article it sounds like they are keeping ALL messages active all the time. For example:

    "They say the system is so overburdened that creating a daily backup has become impossible; there is so much data that it takes more than 24 hours to copy it."

    So, it seems like the solution would be to periodically lop off old messages to offline storage (tape, spare drives, whatever). In the event of a lawsuit the old messages could be reasonably recovered and the cost for such a system would be extremely minimal.

  20. Complying with Public Records Acts by EconomyGuy · · Score: 4, Insightful

    Unlike a legal office where communications are governed by extensive regulation, governments are really only required to keep records of official documents and decisions. The myriad of e-mails leading up to a decision are not generally protected under such an act, nor are snail mail or phone conversations. In fact, the whole idea of there being a digital trail to follow for governmental decision making is really very new. Does it makes sense to change that practice? Do we really think our government officials should be so closely watched that EVERY e-mail/phone conversation/smoke signal should be recorded and exposed to public scrutiny? Talk about making an unattractive job even less inticing.

    In responce to the posters question about all those subpoenas: welcome to the world of civil litigation, where the first one to destroy the evidence wins!

    --
    Only 120 characters... who can summarize their entire world understanding in 120 characters?!
    1. Re:Complying with Public Records Acts by gerardrj · · Score: 2, Insightful

      Yes. When a government official is supposed to be acting in the best interest of the people they should be subject to scrutiny at any level that is reasonably available.
      Storing older emails is a rather trivial issue of collecting, compressing and copying to an inexpensive tape or hard drive which can be archived. A 250GB IDE drive is quite inexpensive and could probably archive several hundred million emails, many more than the city is claiming it will delete.

      In a time when the government is fading further from the ideal of a democracy, or even the republic which it's supposed to be, I think that accountability of the elected and non-elected government workers is critical.

      In this particular case, it seems that all of the city's email is in one central location, otherwise how could they just delete it without putting that responsibility in the hands of users, or sending support people to each and every desk.

      Pick a day, halt all email access inbound and outbound (government usually doesn't work on Sunday), copy all the existing email to a drive, then start the deleting process.

      --
      Article X: The powers not delegated... by the Constitution...are reserved...to the people
  21. Re:HTML in email? Forbidden! by Hatta · · Score: 4, Insightful

    I'll handle these in reverse order.

    Word attachments are acceptable when they are just a means of moving files around, and not the entire content of the email. What is not acceptable is expecting me to load a large word processor just so you can use the company letterhead. In my experience the latter type is far more common. Besides the security implications (macro viruses, etc), I do not have a gui on the computer I read my email. Nor should I need one.

    As for HTML email, I'm simply not going to render strange IMG tags. They could lead to goatse, or back to a spammer's site, and now they know my email is active. HTML email generally looks like it was designed by an 8 year old with downs syndrome anyway. Plain text is just more readable for nearly every email. Check out HTML email is STILL evil!!! for more.

    --
    Give me Classic Slashdot or give me death!
  22. Re:Millons of old spam, most likely. by Tony+Hoyle · · Score: 2, Insightful

    Spam is easily recognized by the subject line? Boy I wish I was getting your spam instead of mine!

    Mine's full of:

    hi
    how are you?
    Please Complete and Return
    I miss you
    Fwd: I need your help
    Re: Your Account

    etc... etc...

    Any one of these could be legitimate (occasionally you get a headline that's so inocuous I think the spam filter has got it wrong... until I actually read the email).

  23. Re:Bayesian Filter to Identify Officail Mail by Raven42rac · · Score: 4, Insightful

    Five points for excellent use of buzzwords. I would say compress messages older than 90 days and save them. The government is not supposed to just willy nilly throw things away. I would invest in more hard drive space to hedge against lawsuits.

    --
    I hate sigs.
  24. Information Lifecycle Management by bolix · · Score: 4, Insightful

    ILM is the next big thing. Its the logical extension to the ever increasing SAN/NAS Server/Workstation exponentially-increasing-data problem (go google for pretenders to the law).

    You can't oversee growing data storage without a parallel increase in administration costs. Instead, the idea is to build automatic archiving into your storage architecture.

    In practice this means you build tiers of storage/archive methods. Tier 1 is a high tkt Shark SAN etc, Tier 2 is lower priced SATA RAID and Tier 3 is a DAS Tape Library. Build retention guidelines into the storage management playform (Tivoli etc). Older items are automatically moved to the Tier corresponding to that retention/access policy. Really old items "live" on Tape. Frequently accessed data lives on the high speed boxes near to the users/application. You snapshot updates to a DR replica offsite or burn periodic Tape sets etc. Its a good idea to team this with storage virtualization (virtual LUNS/ Metadata directory servers) and you can add/rotate/modify the storage tiers when necessary without any downtime.

    From a user perspective, you click on the link and if applicable, get notified the item is being retrieved from media x (its mostly transparent). Worse case - access times are in the minutes.

    Of course, all this comes with a high price. Enterprise Storage systems are not cheap. Recent legislated policy (Sarbanes Oxley etc) enforces the retention of some media (e.g. email). You cannot rely on end users to enforce data retention. This lets you mandate tiers of protection and is highly configurable to support per application monitoring.

    Nothing is foolproof. Its still being finessed but if you can afford it - its truly a thing of beauty.

  25. Re:Simple... by BasilBrush · · Score: 3, Insightful

    That's fine. Disk storage is cheap. Certainly cheaper than paying hundreds of staff for the time taken to go through all their old mail sorting the wheat from the chaff. The right solution to running out of disk space for email is to add more disks.

  26. Re:Simple... (not) by BasilBrush · · Score: 2, Insightful

    The 10 secretaries in question were only using 1 GB each per year. 10GB per year in total. If your company is as large as you imply, the amount of work hours involved in sorting though old emails will be larger than that. Each person (or their PA) would need to do their own. That's a lot of hours.

  27. Easy solution if you have management's support... by Anonymous Coward · · Score: 1, Insightful

    There is an easy enough solution to this if you have management's support. (Assuming I understand the problem, which is apparently that the pop server is overburdened.)

    The first step is to solve the steady-state problem. This is easy enough: you make it very well known that they are not to leave messages older than 90 days in their mailbox. But because the messages may contain official stuff and can't be deleted, you don't delete old messages. Instead, you test every mailbox periodically to see if it contains old stuff, and if it does, you block delivery of new messages to the mailbox. You can leave them with POP access to it so they can clean it out. Of course, you make this policy well-known. And you put an automated message into their mailbox that notifies them they've been blocked too.

    By doing this, you've set up a give and take situation: as long as they do their part to keep their mailbox generally clean, you do your part to deliver messages. Presumably managers will encourage their employees to keep up on the maintenance because they don't want employees to be unable to be reached by e-mail.

    Second part is to solve the problem of too much data already on the server. To do this, you announce the policy above and put it into place. Send people advance notice (two weeks, one week, two days, one day, etc.) that their mailbox is going to be locked if they don't clean it. For those who don't clean it, go ahead and lock it. Leave it that way for a short while (until you get some complaints) and then announce a one-week extension.

    Then, for those who *still* don't do anything about it, take all the messages that are older than 60 days, remove them from the user's mailbox, then put them aside. Burn a CD of the mailbox and send it (interoffice mail, or whatever) to the user's manager. Then make your own archive of all such messages, and delete them from the server.

    Now the recalcitrant people will have to go see their manager to get their old messages, and the managers will know why and will know that they've been given several warnings and an extension and still didn't bother to do anything about it. Maybe the manager won't care, but I can't imagine they'll have a positive feeling about their employee having found a way to waste their time.

  28. Re:Simple... by Doppler00 · · Score: 2, Insightful

    How to save 90% of disk space:

    Sort all users e-mail recieved by size for a given year.

    Delete 5% of the largest e-mails. These will probably account for around 90% of all disk usage. They probably represent file attachments which should have been stored on a server instead of in an e-mail account anyway.

    Just think, when you mail a 2MB attachment to 3,000 people in a division, that could use quite a bit of disk space.

  29. Re:Simple... by devilspgd · · Score: 4, Insightful

    With a properly designed mail system, only one copy of the message would be stored on disk, with pointers from each mailbox to the single central copy.

    *shrugs*

    --
    Give a man a fish, he'll eat for a day, but teach a man to phish...
  30. Re:Great way to ignore your customers by NetGyver · · Score: 2, Insightful

    I hate it when people associate taxpayers + government with customers + business. The two relationships are very different.

    There are no laws I know of that tell me I have to pay Company X for products. If I don't want any products from Company X, I won't buy anything from them. I'm not going to be breaking any laws because of it. However, if I don't pay my taxes I'll get hounded to death with the possibly of being tossed in jail.

    See the difference?

    --
    A Penny for my thoughts? Here's my two cents. I got ripped off!
  31. Re:Great way to ignore your customers by Kjella · · Score: 2, Insightful

    I don't know what business you work in, but if they haven't read it in 3 days, they've lost my business.

    Let me guess.... you're emigrating a lot, yes? Otherwise you might have to have "business" with the government. Good luck getting a reply in three days there.

    Kjella

    --
    Live today, because you never know what tomorrow brings
  32. Bitch? by Anonymous Coward · · Score: 1, Insightful

    She doesn't sound like a bitch. She sounds like someone who wanted to share the experiences of the company picnic with everyone. That doesn't sound like a bitch to me. She may not have known about the problems her post would cause. If she had done something like this in the past, and did it again, that might make her "stupid", but not a bitch. A bitch is someone who complains when another employee has a family picture on her desk, because personal decorations are against company policy. A bitch is someone who expects people to drop everything to help him/her, but won't lift a finger to help others. A bitch is, generally, a person who is unpleasant to be around, a person whom almost no one likes. A bitch is not someone who would pass around pictures of the company picnic. A person who calls a woman a "stupid bitch" because she made a simple mistake sounds like a sexist asshole to me, and not someone that I'd like to know.