You've Got Mail -- Tons Of It
Daniel Goldman writes "The Baltimore Sun has an article about the City of Baltimore's email problem." A snippet: "Millions of old e-mail messages are clogging Baltimore's municipal computers, so the city is going to start automatically deleting any messages older than 90 days.
A common practice in private business, the move raises questions when made by a municipality, which has a responsibility to retain certain public records." Goldman points out "Just think about all the potential law suits; 'if it's not there, they can't subpoena it.'"
The tax payers.
Outsource each employees email to GMail. Problem solved.
This might be a practical use of one, determine which emails are valid, and which aren't, like a spam filter. Allow users to flag 25% or so emails as important, and archive those.
got sig?
Either Google or the Internet Archive would be happy to archive that data for the City of Baltimore and keep it available for public reference.
Since they need to delete tons of old messages spam included, but want to save official email, why don't they train a Bayesian Filter to sort through and save as much as possible. Since they can't rely on their employees actually saving each message which was official to their hard drives.
~ there are 10 types of people in this world, those that can read binary and those that can't
Now we get to subpoena entire hard drives so we can run data-recovery software on them. It would be smart of any operation, public or private, to wait out the statute of limitations (which I realize may vary) of any states with which they have substantial contacts before they start deleting data.
figure out what percentage is spam, and sue spammers to recover damages for lost resources.
can't they, like, just buy a big hard drive and stuff?
If the average message is 10kib (10,000 bytes, make the math easier), and compresses down to 3kib (probably even better if you compress a bunch together), then you'd need roughly 30gib to store 10 million of them. Can you even buy hard drives that small any more?
Add some search index, throw a crappy web interface on it, and call it a day. Never delete an email again!
with hard drive prices so low, i don't see what the big deal is. im sure if they droped $100 into each computer for a 80gig+ drive there would be plenty of space for -gasp- email. it took me a year and a half just to fill my 60 gig drive with MP3's and pron
There are always going to be things like replies to an original question and subsequent follow up questions going back and forth, so normally hanging onto the latest/final reply would be sufficient (providing it had the previous history - clearly showed the conclusion).
Now if they were to use this as an excuse to accidently lose records that would be a different matter. This however is where auditors should be playing a role to ensure that they are keeping the right records and discarding the rubbish.
And I'm sure they'd love to offer it up to the general public, as well. The question comes -- should all of it be public? I'm guessing that there are bits of it, which shouldn't be, and it's be more costly in the long run to try to analyze it, and determine what would have to remain confidential, then to just store it all in the first place.
I'd prefer that people who are familiar with the actual data being stored make the determination if it should be publicly available.
Build it, and they will come^Hplain.
"Baltimore officials, who approved the new e-mail policy at a Board of Estimates meeting last month, say they have no choice but to delete old messages, which are slowing city computers to a crawl. They say the system is so overburdened that creating a daily backup has become impossible; there is so much data that it takes more than 24 hours to copy it."
What?!? What's wrong with an incremental backup? Surely all those millions of messages aren't *changing* every day?!?
Think of all the children that will suffer from this!!!
As public employees working on publicly owned systems, this sounds like a job for a Google appliance. Of course, the existing email would likely have to be processed in some fashion--possibly manually.
Henceforth, though, any email correspondence, unless explicitly marked as "personal" or "sensitive" should be placed in an archive searchable by the public. The "sensitive" messages should be placed in another archive not open to searching by the public. How "personal" messages are handled, and what constitutes "sensitive" should be based on policy established by legislation (of course, open to public debate).
There are two technical culprits here:
1. On-line storage. There's no reason to keep all of everyone's mail on-line on the server (a la IMAP or proprietary MS Exchange) instead of offline on their PC's (a la POP, most often seen with Eudora for non-techies). With offline storage, the servers don't clog, and you can keep as much mail as you like.
The biggest rap agains off-line storage is that you can't control what people do with their mail or how they store it. My old job had a neat solution for this: Eudora downloaded your mail, but stored it on a file server. Each employee had 100 GB or something very large. It worked great; the SMTP/POP servers were never full, and everyone could keep their email.
2. Ridiculous stupid bullshit HTML rich-text mail crap. Can you tell I have a bias here? Aside from being annoying, HTML mail can take up to ten times the size of plain old text. Some of the HTML generated by common email programs is just terrible; filled with repeating tags for every line, and just wasting an incredible amount of space for absolutely zero benefit. (Outlook is bad, but there are others that are just as bad.)
There's no excuse for not fixing these problems. Someday someone's going to tell a court they had to delete mail for these reasons, and someone else is going to explain exactly why they're wrong. Until then, people who want to delete mail for legal reasons will hide behind false technical reasons.
This has to be the stupidest approach to the problem. Their networks are too slow, so instead, they're going to have each employee go through their old email and save individually important messages to their local hard disk? Not only are they going to tie up employees with this manual effort, they're also going to lose key documents and a key service - the ability to centrally search and reply to requests for information. In the future, each department will have to search their local hard drives for this information.
They've taken a simple problem of old or improperly speced equipment and turned it into a manual labor solution instead. That's an insane waste of time and salary. They should just upgrade their network and storage. If I can build a 4 terabyte RAIDed PC for a few thousand dollars, they can centralize their mailserver and back it up for say a hundred thousand, even with extra redundancy and inefficiencies and admin costs.
By contrast, forcing every current employee to perform a task that would eat up weeks of time per employee per year, in a city of Baltimore's size, will cost tens of millions of dollars.
Dumb, dumb, dumb.
--Pat / zippy@cs.brandeis.edu
I don't really trust any entity but myself to make sure I have important information archived/encrypted/etc. For reasons like this, and this recent Slashdot story
I know nothing
Yet another example of buereocracy getting in the way of everyday things. It's basically a lose-lose situation - they can't keep on accumilating email, but they can't delete it either for fear of losing anything important. So the solution? Just add a little disclaimer: "Any email stored in this system is liable to be deleted at any time. By using this system you agree unconditianally to this." Voila. No more problem.
HAH! I just wasted a second of your life making you read this, but I wasted a minute of mine thinking it up. DAMN.
Interociter
-=What do I want? I'm an American. I want more.
Unimportant emails should designate that they are unimportant, by having the last line say "UU" or some other quick, simple system (what's the chance of doing that by mistake?). This would immediately cut out a lot of email so it would not have to be kept.
There should also be similar flags for emails that need to be kept. This way, the only mail that will have to be processed is the email sent by people who don't know the rules. When the people in charge of email recording start bothering them, everyone will learn pretty quickly.
Then software can simply look at the last line/letter of each email, and send it where its supposed to go.
Blatimore should offer some muncipal service, say free waste disposal for a year in Baltimore, per gmail swap. Problem solved.
Backup all e-mails from the last 4½ years into permanent storage, and then from there, get organized. Put spam filters on, force people to sort any important mail or else it gets deleted after, say, two weeks. People always seem to want to "start from scratch". without looking at the situation rationally. Five years of documents, gone overnight. How can anyone not be at least outraged by that?
I'm posting anonymously because this may risk my relationship with my employer.
We see old e-mails as a resource to be harnessed and turned into profit. Thanks to old e-mails we can ensure that no employee leaves with a spotless record since everyone always e-mails something incriminating sooner or later from the company e-mail address.
We also find that the e-mails are great for data repositories; we fill all of our databases with text and when our clients come in, we tell them that those data warehouses contain terabytes of information.
Any email that is worth keeping should be moved to a personal folder. I don't see how this is any different then putting a 10 MB limit on an account. Definitely not front page material either.
http://jayceecorder.blogspot.com
I've lived in Baltimore for many years and it is obvious that the government. will use any excuse to cover up the extereme corruption. When I was there, a superintendent gave a building contract to his uncle who wasted $100,000 and then $100,000 was spent fixing the errors.
As an aside:
If you are gov't employee, it is your responsibility to use email for official business only. All communication should also use proper English (these are not posts to Slashdot) and all the emails and memos should be self-containing (and not include 100k messages of pretty formated reminders of lunches). I don't work in the gov't, but I still follow these rules. This cuts down on most junk.
Clearly those people are abusing the email system; otherwise it would not be such a problem. What they need to do is get the BOFH to keep people's personal email's out of the system... :)
Millions of old emails?
They say it "could be as high" as ten million emails. Well, the mean size of an email is probably around 10k, so that's 100GB of old mail.
100G of storage costs about a day's wages for a city bureaucrat.
The main problem they mention is that "it takes too long to make daily backups." That doesn't seem to be the mail system's problem--why are they making *daily* backups of static data?
If you want to make daily backups of your mp3 collection, you don't copy the whole mess every day. You look for new files and copy only those.
I'm not saying they necessarily need to keep all that old mail. But there's no technical reason why they can't.
The spam problem is unlikely to go away until people start treating it like the attack on the Internet that it is.
I've noticed an annoying trend lately that e-mail sent to businesses is frequently getting just ignored. Certainly it seems much more frequent this year than in the past. I've wondered if this is simply because so many e-mail boxes are getting filled up as fast as the spammers can send.
I'd suspect that the city of Baltimore wouldn't be having any problems if spam weren't such a problem. If the number of messages they had to deal with dropped by 5 to 20 times (depending on which estimates of current spam levels you believe), they could probably just leave the mail where it is.
This is all something I've been struggling with, being a small business owner doing business on the net. My company of 5 people gets between 4,000 and 20,000 borderline spams per day. By borderline, I mean that we throw away obvious viruses and things which score above a certain score in SpamAssasin (I think it's 9). So, that doesn't count the super spammy messages.
If it weren't for our fairly strict and complicated spam blocker setup, and a very powerful machine, we couldn't get the few hundred messages per day that are of interest to us. Spam is killing e-mail. I'm not sure why more people aren't treating it as an attack, but it's really hard to get anyone's interest to take some action. Canceling accounts doesn't even begin to solve the problem.
In the mean time, the City of Baltimore is suffering...
Sean
If email is deleted without record, what's to stop the powers that be claiming anything they want, and simply rewriting history? Did that email say "mission acomplished"? Or did it say "major combat over"? There's no record. But trust us, we said it: we said "major combat over". If you don't belive us, you're a terrorist.
to dump it off to tape and then just store the tapes instead of just deleting it. Though they are probably running an Exchange server so offloading data stores wouldn't be the easiest thing to do. If they were using something with a simple mbox store, they could easily just parse it through a date filter and dump the older than 90 day stuff to tape. At least then it could be retrieved at a later date.
Oh, wait, let me guess, they aren't using tape backups...
Don't Ask Questions. I don't know the answers and even if I did I wouldn't tell you.
- Top-posting a reply, while quoting the entire original message, which recursively contains everything back to the start of the thread (I have actually caught crap sometimes for NOT re-quoting the entire message, because people don't know how to use Outlook to follow a thread any other way.)
- Gratutious, inefficient use of HTML (e.g. FONT tags everywhere instead of stylesheet-based markup) and graphics ('stationery')
These bloat emails by entire orders of magnitude over plain text with minimal quoting, which is sufficient in virtually all cases, and could be retained forever with no problem. Even with all the fluff, at current disk prices, I'd say that archiving old messages (and compressing them in the process, even if all of them were converted from HTML to plaintext) on a server with backup tapes that could be pulled out in case an investigation were conducted, would be pretty darned cheap.[100% ISO 646 Compliant]
SVM, ERGO MONSTRO.
Archive the whole lot of it, and/or copmress it and store it. Don't even try to sift through it all. If and when it is needed, then get it out and pay somebody to sort through it.
Then it's not clogging anything anymore, and also it's there if you ever need it.
of going paperless? Surely the reason for using email instead of memos and letters is for cutting costs, environmental protection, etc. With data storage so cheap, why not archive all the old mail instead of trashing it?
OMFG, we nearly had a lynch mob attack us when we began deleting mail older than *two years* -- it eventually took the intervention of the CFO and a faked mail system "crash" to make 2-year max retention work, and even then there are people still pissed about it, or who claim that "the client" requires them to retain all correspondence (nope, sorry, we checked the contract).
.PST files, which often max out at 2 gig and can get corrupted way too easily, not to mention being fdisked into eternity by clueless helpdesk people.
90 days seems both unrealistic to implement and way too much reliance on
In Texas, email to any state or local public official (either elected or appointed), and certain categories of state and local government employees constitutes both a "public meeting" and "public record". The state's record retention laws say that the email must be kept a minimum of three years, and depending upon content up to 7 or 10 years, or perhaps even forever. If an email is deleted prematurely, then state law provides various levels of punishments for different degrees of tampering with, or destruction of public records, which can be as severe as state jail felony hard time if you've destroyed any email that could be construed as evidence in any criminal court case.
There's also a bigger problem with client-side archiving: workstations go down. Be it from OS/software failures to hardware failure, the client-side solution is a nightmare waiting to happen when it comes to the protection of important data.
A better idea would be to write a script to go through each user's mailboxes every month, export any old emails to text, store the files on a server that uses a journaling filesystem, index the emails, and compress them.
One or two XServe G5s could do the trick quite well.
Up, Up, Down, Down, Left, Right, Left, Right, B, A, START
I question wether the article author understood 'take offline' and 'delete' as the same thing, though they are so very different.
Data is valuable, and Sysadmins know it. (Values such as when combating a lawsuit as the poster suggests or for trend analysis, contact information, or other historical purposes.)
That said, hard drive space is inexpensive and archiving to optical medium is even LESS expensive. When 47 GB of DVD media can be had at Target for less than $10, it makes NO sense to destroy this data.
I only came here to do two things; kick some ass, and drink some beer...looks like we're almost out of beer.
So, what's new? The community I live in is famous for losing hard copies of just about anything you can imagine. I'm not sure whether it has really been lost or whether it was decided that it was better to be without certain possibly troublesome papers. So now they're intentionally losing stuff in order to avoid being drowned in cruft. It just goes to remind that whenever you deal with authority, you should keep hard copies of your correspondence :)
----- One learns to itch where one can scratch.
We don't want someone to be able to request something from backups that the user thinks is gone.
This way it's up to the user to decide if they want their data archived. And the onus is on the user to comply with however long the data is supposed to be kept before being destroyed.
This highlights a fundamental problem with email -- many people pass documents as attachments, or in the body of the email, instead of using email as a sort of metadata describing their works in progress. Documents shouldn't be passed around in email; they should be stored on a network share, where proper controls for mutual exclusion and such can be employed.
'He who has to break a thing to find out what it is, has left the path of wisdom.' -- Gandalf to Saruman
I am not sure if they can use email as official communication? There would be problems with repudiation ("we never received it"), privacy ("someone intercepted it who was not supposed to") and authentication ("it wasn't me who sent it, it was my dog"). Can they use an email in the court then? What would have to be done is to have all the messages signed and encrypted with a public key, and perhaps have some way for the sender to get a receipt back when reciever reads the message.
According to Georgetown University that I found regarding retention of records in Maryland:
"8. Has any public records legislation/administrative regulation been proposed calling for "permanent public access" to electronic public records? _x__ Yes ___ No a. If "Yes," cite to and briefly discuss the legislation/proposed regulation; what was the outcome? Arguably, Maryland has such a provision in MD. REGS. CODE tit 14.18.04. Certain electronic records may be considered "permanent electronic records" in they have "sufficient historical, administrative, legal, fiscal, or other archival value to warrant preservation by the Archives beyond the time that the record is needed by the agency that created it." MD. REGS. CODE tit 14.18.04.03(B)(15). Nevertheless, many electronic records will not rise to the level of importance that will ensure permanence."
The hard part is determining what is important to save and what is not. In general, 7 years is the standard retention time. In our litigious world, keeping anything to prove your case until the statute of limitations runs out is a wise move. Losing emails you don't want your g/f to see is technically called an "oops, I accidently hit delete"
Pete Carr Owner Chatmag.com
nt
Though I don't work in the auditors office in my state, here is what they implemented. Any document (digital or not) over 30 days must be made public. Solution, any e-mail over 30 days is deleted. It allows them to not worry about keeping all e-mail till the end-of-time and not worry about making e-mail public. Great solution in that scenario.
Working at a law firm we have to keep everything for 7 years. We have a system in place that takes all mail over 90 days old pulls it out of exchange and move it to the SAN. As a plus it puts a link back into the information store to make it look like the message is still there. User wants a Old message he can still get it himself w/o a IT person having to do dig up a tame, restore the file and the e-mailing it to him (Thus creating MORE mail). The messages are still searchable and it makes retrival when needed a snap.
Mind you, we are only a 700 user shop. But nothing gets deleted. If it gets buy the spam filter it gets saved.
People find it strange that I don't know how to juggle or tap dance.
As far as I've been able to figure out, this arose from a lawsuit against the county where an e-mail retrived from two years previous proved a county commissioner to be taking bribes in a zoning issue.
Rather than fix the corruption, just ensure that it's covered up more efficiently. Gotta love local governments.
No boom today. Boom tomorrow. There's always a boom tomorrow. - Cmdr. Susan Ivanova
Once an actual human person has read and acted on the mail, they should be able to mark it "official business" and/or move the email into an "official business" folder which does get kept as required.
Better procedures and training goes a long way here. These same folks have no problems with snail mail.
I'm surprised that there aren't any state laws that would override that local limit.
The problem is that their average might not be quite that. Usually the file sizes in large systems (operating systems and web) follow a long tailed distribution, that is the chance of seeing a very large file all of a sudden is not too low. So most messages during a certain day might just be about 5k each but then someone sends a 5MB PDF brochure or a zipped folder of "really cool images of sunsets or mountains" that are worth a thousand smaller 5k messages.
Actually, it doesn't have to be encrypted--any hoop that you can people jump through to mail you is fine, as long as it isn't something that spammers will be able to automate. For example, you could also use a randomly generated email address that changes frequently, and provide a website with a "mailto" link and a challenge image. (However, the challenge image might provide problems for blind people, so that might not fly).
This should cut out almost all the spam, cutting mail down to a managable volume for archiving, so you can get back to worrying about the other problem with satisfying record retention laws: finding a way to keep the data. Some states require records be maintained for a long time--long enough that you have to worry about media life, and the availability of readers even if the media lasts long enough.
Occassionally, we need to send documents to each other, or to our clients. These are not just simple text documents - they are design specs, product proposals, and other docs that contain images, graphs, charts, and other multimedia content.
.doc to the other day...
Sounds like you want to learn about pdf.
Why shouldn't we just email them the document, if we know that everyone in the circle has Word?
Because many word documents contain a history of the last n changes that were made to them. So your client might get to see some "old" figures that they're not supposed to see. Or the name of that other client that you've sent that
Just dump the old email to DVD-R and archive it somewhere. If someone wants to subpoena it, burn off copies and wish 'em luck. Even if the city is getting a million pieces of spam a day, at 5kb each after data compression, that's just one DVD-R per day at a buck or so each, peanuts compared to what the city already must spend xeroxing memos for records retention purposes.
A better option would be to archive old messages rather than remove them entirely. From the article it sounds like they are keeping ALL messages active all the time. For example:
"They say the system is so overburdened that creating a daily backup has become impossible; there is so much data that it takes more than 24 hours to copy it."
So, it seems like the solution would be to periodically lop off old messages to offline storage (tape, spare drives, whatever). In the event of a lawsuit the old messages could be reasonably recovered and the cost for such a system would be extremely minimal.
Just rzip (better than bzip2 with large files) the email archive and burn it to DVDrs. So in case of real legal necessity, it's possible to access it and the whole setup (DVD-burner + media) probably cost around 100$.
Treehugger? Treehugger... Treehugger!
Sometimes an editable format is needed because you want the recipient to edit the document. Retaining the change history of document is also important if the sender expects to recieve changes.
Jackass.
Unlike a legal office where communications are governed by extensive regulation, governments are really only required to keep records of official documents and decisions. The myriad of e-mails leading up to a decision are not generally protected under such an act, nor are snail mail or phone conversations. In fact, the whole idea of there being a digital trail to follow for governmental decision making is really very new. Does it makes sense to change that practice? Do we really think our government officials should be so closely watched that EVERY e-mail/phone conversation/smoke signal should be recorded and exposed to public scrutiny? Talk about making an unattractive job even less inticing.
In responce to the posters question about all those subpoenas: welcome to the world of civil litigation, where the first one to destroy the evidence wins!
Only 120 characters... who can summarize their entire world understanding in 120 characters?!
Isn't this exactly what products such as EMC's Centera were designed for? No, I don't work for EMC, but I have worked with the Centera... it does the job well.
I'll handle these in reverse order.
Word attachments are acceptable when they are just a means of moving files around, and not the entire content of the email. What is not acceptable is expecting me to load a large word processor just so you can use the company letterhead. In my experience the latter type is far more common. Besides the security implications (macro viruses, etc), I do not have a gui on the computer I read my email. Nor should I need one.
As for HTML email, I'm simply not going to render strange IMG tags. They could lead to goatse, or back to a spammer's site, and now they know my email is active. HTML email generally looks like it was designed by an 8 year old with downs syndrome anyway. Plain text is just more readable for nearly every email. Check out HTML email is STILL evil!!! for more.
Give me Classic Slashdot or give me death!
how many actual business inquiries do you get out of all that email? If it's not too large a number, wouldn't it be easier to just switch back to POTS for your business, and just trash the concept of email as being unworkable at this time without a ton of headaches? Just a POTS and an answering machine might be all you need.
Another alternative might be to use webforms instead of email, and indicate any replies back to the prospective customer will be done on your nickle, on the phone, and they should give you a time and date to return the call.
They say the system is so overburdened that creating a daily backup has become impossible; there is so much data that it takes more than 24 hours to copy it
I find that rather hard to believe. They only need to back up the new emails, then they can delete them at any time without actually losing them. I doubt they see many terabytes of new email every day. Nine times out of ten, any IT tech who says something is "impossible" is just lazy and/or incompetent.
We have to spend a lot of time telling people to **NOT** save to local drives. If it is important or confidential, or may be in the future, this should not be saved locally unless you want to loose it or explain to an enquiry why it was found on sale in a car boot sale after a break in. This is what a network is for.
The answer to the problem in the article is quotas. *nix has them, Novell has them and even Windows has them. Our email quota works as follows
Limit 1 - email user once per day marked high importance that they are getting close.
Limit 2 - disable sending and continue with (2k) warning message.
Limit 3 - disable receiving apart from one final message saying that it would all start working again when the user clears some space
When they can't send/receive, they get a dialogue box reminding them when they try and when they can't receive, the sender gets a messge.
This does make for support calls like...
"Why does my computer tell me that the email is full up and I can't send any more?"
"Because your email is full up. You have a message explaining this to you."
"X tried to send me an email and it bounced saying that my mailbox was full up. Why?"
"Because your mailbox is full up."
I'll see your Constitution and raise you a Queen.
hm... perhaps they should use a product like Email Extender?
http://www.legato.com/products/emailxtender/
ILM is the next big thing. Its the logical extension to the ever increasing SAN/NAS Server/Workstation exponentially-increasing-data problem (go google for pretenders to the law).
You can't oversee growing data storage without a parallel increase in administration costs. Instead, the idea is to build automatic archiving into your storage architecture.
In practice this means you build tiers of storage/archive methods. Tier 1 is a high tkt Shark SAN etc, Tier 2 is lower priced SATA RAID and Tier 3 is a DAS Tape Library. Build retention guidelines into the storage management playform (Tivoli etc). Older items are automatically moved to the Tier corresponding to that retention/access policy. Really old items "live" on Tape. Frequently accessed data lives on the high speed boxes near to the users/application. You snapshot updates to a DR replica offsite or burn periodic Tape sets etc. Its a good idea to team this with storage virtualization (virtual LUNS/ Metadata directory servers) and you can add/rotate/modify the storage tiers when necessary without any downtime.
From a user perspective, you click on the link and if applicable, get notified the item is being retrieved from media x (its mostly transparent). Worse case - access times are in the minutes.
Of course, all this comes with a high price. Enterprise Storage systems are not cheap. Recent legislated policy (Sarbanes Oxley etc) enforces the retention of some media (e.g. email). You cannot rely on end users to enforce data retention. This lets you mandate tiers of protection and is highly configurable to support per application monitoring.
Nothing is foolproof. Its still being finessed but if you can afford it - its truly a thing of beauty.
...as their email system, where it is trivial to archive old email from your main mailbox off to an archive database file where it's easily moveable to some offline, dirt-cheap media such as CDR, etc.
There comes a point where that, too, gets very expensive. At my company (large US healthcare provider, with governmental and private contracts both HMO and PPO), after saying 3, 5, and 7 years, our lawyers have told us we have to archive all email potentially forever that the end user doesn't specifically delete. They may do an end-run around the deletion and archive those, too, but I don't know. Anyway, our email system (Lotus Notes, which is an extreme HOG) eats somewhere between 100GB - 1TB/week. I was told it was well over 1TB, but I don't believe them. This is of course due to older Notes versions inability to store attachments in public directories and simply sending a copy to each and every recipient (and the stupidity of no size limits on internal email). There is a point to how many drives you can add to a SAN, and then you have to get a whole extra chassis, which is where the expensive part comes in. To keep buying new SAN units every 6 months or so, as well as the harddrives to put in them (plus the maintenance contracts, 24/7 support, etc) could easily add up to $1million/year or more. Which is definitely more costly than 10 average low-mid level administrator's salaries.
What he said.
The 10 secretaries in question were only using 1 GB each per year. 10GB per year in total. If your company is as large as you imply, the amount of work hours involved in sorting though old emails will be larger than that. Each person (or their PA) would need to do their own. That's a lot of hours.
At one company that I worked for, they got the brilliant idea to delete all email older than 30 days. They also didn't want employees to make backups of their personal mailboxes. They intentionally wanted all traces of old email to disappear. While I'm sure that it made the lawyers happy, it caused a lot of grief for the people actually doing work for the customer. Many design decisions, bug reports and other important things were only documented in email messages. This is supposed to be the age of the paperless office, right? When you are involved in a multi-year project, you often need to refer to old messages. It also had the effect of making old policy memos disappear, whose existence had proved to be very inconvenient to management on several notable occasions.
Mea navis aericumbens anguillis abundat
The real way to do storage, though, is to let the users keep their mailboxes on their own PCs. My company's IT department pushed us in that direction many years ago, partly because it's the only way to really support laptop users, but partly because it gets rid of the central storage bottleneck and makes it the user's problem to not run out of disk.
A more convenient mail system would make it possible to archive this stuff to DVDs, or CDs for users with smaller mailboxes. (CDs are more useful, because most PCs have CD readers, and most government-office PCs are unlikely to have DVD readers, and probably most don't have CD writers yet either - you'd have to do this in a centralized fashion.) So have a centralized group burn the stuff to CD, unless the users have their own CD burners.
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
Of course, if the city /wants/ those messages to go away because of the threat of subpoenas, that would be a problem.
*****
Dear Mary,
I yearn for you tragically,
A.T. Tappman, Chaplain, U.S. Army.
Also, there's the issue of centralized vs. distributed archiving. If you're centralizing, DVDs are obviously the better choice, because you can store 6 times are much data on each, and if you're doing one mailbox at a time, you're less likely to need multiple disks. For distributed use, though, CDs may win, because government bureaucrats are much more likely to have CD readers than DVD readers; some of them will also have CD writers. Probably the best choice is to have one archive copy and one copy for the user to keep, and bar-code-label the archive.
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
I just took my Exchange mailbox and WinZipped it. 1.4 GB raw became 760 MB compressed - about 50%. Not only is a typical Exchange user unlikely to have most of their mail messages fit into 10KB of text, they're stored in clunky formats. Probably most of the bytes in my mailbox are Powerpoint, and most of the rest are Word. 3/4 of my spam is probably 10 KB of text :-), but the rest has embedded pictures, and in any case it all gets deleted.
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
When I heard my city were outsourcing their garbage collection services, I imagined office blocks of staff in India sifting through online hex editors looking for spare memory blocks to delete.
Vintage computer adverts: http://www.vintageadbrowser.com/computers-and-software-ads
Right. Our company originally tried to instate size limits when we went to Notes (only 3 years ago), but then the lawyers said we need to keep everything anyway (HIPAA requirements). So even with the exorbitant expense of the system, it is probably still cheaper to keep expanding every couple months rather than pay people to sit there and sort through their own email. Anything from an external party must be kept, and anything remotely regarding a customer must be kept as well. It's a huge pain, and they took the easy way out by archiving every single email. But neither option is very cost effective. There are four people that I know of in my department alone that have email boxes (extensively categorized with dozens or up to hundreds of folders) with up to 20GB each. It's crazy. But even without the ever looming threat of a lawsuit, they claim that they have been able to disprove what other people were badmouthing them about by being able to produce an email from that person stating the exact opposite a year or two previously. I've witnessed it once, and it is pretty funny watching somebody turn beet red in a room with 25 supervisor's and above.
wtf?
"our lawyers have told us we have to archive all email potentially forever that the end user doesn't specifically delete."
I thought email was archived at the server level, not the end user level-- I thought *everything* coming in and going out gets archived. There is no logic that I can see, in letting the end users decide what gets logged.
There is an easy enough solution to this if you have management's support. (Assuming I understand the problem, which is apparently that the pop server is overburdened.)
The first step is to solve the steady-state problem. This is easy enough: you make it very well known that they are not to leave messages older than 90 days in their mailbox. But because the messages may contain official stuff and can't be deleted, you don't delete old messages. Instead, you test every mailbox periodically to see if it contains old stuff, and if it does, you block delivery of new messages to the mailbox. You can leave them with POP access to it so they can clean it out. Of course, you make this policy well-known. And you put an automated message into their mailbox that notifies them they've been blocked too.
By doing this, you've set up a give and take situation: as long as they do their part to keep their mailbox generally clean, you do your part to deliver messages. Presumably managers will encourage their employees to keep up on the maintenance because they don't want employees to be unable to be reached by e-mail.
Second part is to solve the problem of too much data already on the server. To do this, you announce the policy above and put it into place. Send people advance notice (two weeks, one week, two days, one day, etc.) that their mailbox is going to be locked if they don't clean it. For those who don't clean it, go ahead and lock it. Leave it that way for a short while (until you get some complaints) and then announce a one-week extension.
Then, for those who *still* don't do anything about it, take all the messages that are older than 60 days, remove them from the user's mailbox, then put them aside. Burn a CD of the mailbox and send it (interoffice mail, or whatever) to the user's manager. Then make your own archive of all such messages, and delete them from the server.
Now the recalcitrant people will have to go see their manager to get their old messages, and the managers will know why and will know that they've been given several warnings and an extension and still didn't bother to do anything about it. Maybe the manager won't care, but I can't imagine they'll have a positive feeling about their employee having found a way to waste their time.
is what my lawyer wife requires. All incoming and outgoing mail is archived - though I draw the line at saving trash and only saves incoming clean mail after the spam and virus filters. She actually uses the mail archives from time to time. So, logrotate.conf is set to 120 months for e-mail.
Oh well, what the hell...
Probably too late in the thread for this to catch much attention...
However, I work for a local Government office, close to Charlotte, NC. It is our stated policy to remove all e-mail older than two weeks except for e-mail that is crucial to job performance. This is less to save space then it is to keep the news media from finding dirt. We really don't care that our e-mail is public record. We really have nothing to hide. However, the local newspaper (in Charlotte) is constantly asking for _ALL_ of the County Manager's e-mail. They aren't looking for anything specific, they are just on a fishing trip, trying to see what trouble they can stir up. They rely on the Freedom of Information act to, hopefully, generate some news, instead of doing some real investigative work. *sigh*
Anonymity enabled for self-protection. Wouldn't want the powers that be see my e-mail...
Being a government organization, I would expect them to first print all the e-mail and make 3 photocopies for filing, before deleting it...
Oh well, what the hell...
In a city that can scarcely get the corpses out of the inner harbor within 90 days, this is a huge advance in bureaucratic efficiency!
PotUS only makes $200K/yr?
"I would invest in more hard drive space to hedge against lawsuits."
Was that the sound of Raven42rac agreeing to higher taxes? You rock dude.
If you really want collaborative document editing with
version control then emailing MS-Word documents around is NOT
the way to do it. You end up wasting a lot of space and
bandwidth and when it is 5 minutes to the critical meeting
still everyone argues about who ended up with the final
draft.
Email is for text messages that take the place of verbal
discussion. A blanket ban on all email larger than 1M solves
the problem nicely.
I work for one of the Big Three (automotive) that has a policy of deleting email messages after 60 days. (They don't automatically do it, but it is supposed to be self policing). All this does is let the F**king liars win. Numerous times in "corporate" life I've had people tell me that "I didn't get that", "I wasn't informed", "I didn't get that message", usually on messages that are greater than 60 days old. When they and their management get a second copy they usually change their tune. I understand the ability to limit lawsuit liabilities but they sap the energy of their best workers when they let the liars win.
Use your head, can't you, use your head,
You're on earth, there's no cure for that - S. Beckett
"As an aside:
If you are gov't employee, it is your responsibility to use email for official business only. All communication should also use proper English (these are not posts to Slashdot) and all the emails and memos should be self-containing (and not include 100k messages of pretty formated reminders of lunches). I don't work in the gov't, but I still follow these rules. This cuts down on most junk."
Sounds to me like people need to learn what is what. Relatively transitional things like scheduling needs to go into a groupware solution.
There's voice-mail, and SMS as well as cell-phones for that level.
Maybe a DMS (Document Managment System) for contracts, and other important documents that have a long lifespan.
In short, use the right tool for the right problem, instead of an all-purpose solution to every problem.
"PGP is worse. It isn't supported by *any* mailer widely uses mailer (installing an extra 'plugin' does not count - most of the people I talk to have absolutely no idea what a plugin is let alone how to install one)."
Do what everyone else apparently does. "Click on this to install PGP plugin" in body of E-Mail. For once use OE's "problems" to your advantage.
There's a reason why we call ourselves Baltimorons.
This explains the size of their emails:
make_email.php: <?php echo $message; include (basename($PHP_SELF) ); ?>
I'm still trying to figure out what people mean by 'social skills' here.
if it's not there, they can't subpoena it.
Ever since emails at MS got exposed in court and Monica Lewinsky's emails to her friend about the insensitive clod not getting her flowers everyone has decided to have an Official Policy For Getting Rid of Old Potentially Incriminating Email.
It's a double plus advantage: clear out space on the servers, increase the speed of searches through old email, and decrease legal liability.
But it doesn't increase my trust in those companies or government agencies that have such policies.
Verbal communications, hints and innuendo have provided vanishing trails of evidence for years while paper has reinforced accountability. With use of paper ebbing, accountability will decrease and, along with it, trust in other people and institutions. As if we needed less trust in powerful institutions, which operate under enough invisibility and leave-no-trace principles now that much greater abuses of trust are possible than before.
"Provided by the management for your protection."
dissallow external email for those that don't really need it - that should fix it
She doesn't sound like a bitch. She sounds like someone who wanted to share the experiences of the company picnic with everyone. That doesn't sound like a bitch to me. She may not have known about the problems her post would cause. If she had done something like this in the past, and did it again, that might make her "stupid", but not a bitch. A bitch is someone who complains when another employee has a family picture on her desk, because personal decorations are against company policy. A bitch is someone who expects people to drop everything to help him/her, but won't lift a finger to help others. A bitch is, generally, a person who is unpleasant to be around, a person whom almost no one likes. A bitch is not someone who would pass around pictures of the company picnic. A person who calls a woman a "stupid bitch" because she made a simple mistake sounds like a sexist asshole to me, and not someone that I'd like to know.
Here's a simple solution, get everyone using central IMAP servers for their email, have a little scripty-poo that tars the mail boxes every month & burn them to cd. Then put the CDs in paper jackets into shoeboxes & stuff them in your city archives.
There are a thousand forms of subversion, but few can equal the convenience and immediacy of a cream pie -Noel Godin
However, if you were to have a critical message come in that could not be missed, this might be used by the Enron types to deny entry. Also, you might want to do a delivery straight to the user - saves on delivery time and puts the BOFH in his place. Blocking delivery might seem a good idea at first, but you're going to have tons of regulations to dodge before you reach the manager.
"Forget the engineers." -Carly Fiorina, briber of MIT Technology Review.
I never said that the 'we' I was referring to meant the end user. I may have a tendency to switch back and forth between possessive's, but in this sentence, 'we' meant ITG services as a whole division. There are also many ways to setup an email backup solution, none of which I know of for sure are the one that we use. You could set it up so it backs up inboxes at a certain time each day - so anything that came in that day, and was deleted before then was not saved. You could trust people's judgement and remove from your archives any emails that the end user specifically deletes from their trash folder, or you could bite the bullet and archive everything every time.
Not gonna make any noticable difference at all. Nobody ever sends large attachments to an external email address, because chances are that that external server will not accept it. Also, internal email for a large company is orders of magnitude greater than external email. You've got all your meeting notices, meeting minutes, project notes, quick questions, blah blah blah. I might get 100 emails a day from internal employees, and 5 or so from external.
"by being able to produce an email from that person stating the exact opposite a year or two previously"
It isn't just the amount of email that you store but the fact that you are able to search through it all to find a piece of text from a specific individual from 1-2 years ago! How the heck do you do it? It must take a good chunk of time.
It is by the juice of the coffee bean that thoughts acquire speed, the teeth acquire stains. The stains become a warning
Actually it is done by gnomes living in your memory controller.
1. Clear unused memory blocks
2. ?
3. Profit!
from the city-that-reads-remember dept. Very appropriate dept. Baltimore's motto, after all, is "The City That Reads".
Opera Watch - An Opera browser blog.