Email Offline At the Home of Sendmail
BobJacobsen writes "The UC Berkeley email system has been either offline, or only providing limited access, for more than a week. How can the place where sendmail originated fall so far? The campus CIO gave an internal seminar (video, slides) where he discussed the incident, the response, and some of the history. Briefly, the growth of email clients was going to overwhelm the system eventually, but the crisis was advanced when a disk failure required a restart after some time offline. Not discussed is the long series of failures to identify and implement the replacement system (1, 2, 3, 4). Like the New York City Dept. of Education problem discussed yesterday, this is a failure of planning and management being discussed as a problem with (inflexible) technology. How can IT people solve things like this?"
I hate being first.
It's the backend. When you have too many connections on too few servers, with not enough storage
you usually see this kinda issue.
It's an economic one. It needs an economic solution.
e.g.
Have people buy a $10 ticket to get an account on the email server.
Deleted
I am depressed.
"Eve of Destruction", it's not just for old hippies anymore...
Oh no, a service had downtime. Surely this is the end of the world and only the greatest sinners of the IT world ever have to bring something down for maintenance.
To offset political mods, replace Flamebait with Insightful.
By hiring more cost accountants and requiring special and complicated business case studies with a thorough financial analysis on even the most mundane upgrade on how it will raise the companys stock price. Just ask any visionary MBA? Always buy cheap consumer grade stuff and view talent as unneccesary expenses. Do that and you will never have problems. What could go wrong?
http://saveie6.com/
When I started college in 1991 I was amazed by the telnet access I had to the email account given to me by the University. I hadn't had an email address prior to that. Now I have an email addresses through hotmail, gmail and yahoo that I use for different things and facebook also gives me an email address. So, I doubt students really need email addresses provided by the university anymore. As for the NYC Dept of Ed example, I think it just shows that trying to build IT competence into a government agency basically a waste of money because the institutional culture of government. In short, all of these kinds of organizations could just offer email through gmail/google business or any number of other providers that will scale up almost infinitely.
if your life is such a big joke then why should I care?
I know /. is a a little slow usually, but it's a little silly to see this article pop up now as full service has essentially been restored (just now getting back mail client access, while webmail was working for the past few days).
QED
A Free Hitchhiker's Guide Novella
http://thepiratebay.org/torrent/6848623/Perfect_Me_By_Jason_Z._Christie
Maybe it has something to with the fact that the state of california has cannibalized the funding for my beloved alma mater.
Beware the Jubjub bird, and shun the frumious Bandersnatch.
Briefly, the growth of email clients was going to overwhelm the system eventually, but the crisis was advanced when a disk failure required a restart after some time offline.
Capacity planning is supposed to account for reduced capacity due to component failures, system outages, and temporary demand spikes due to restart events.
It's called sendmail.
Not sendmailnomatterwhat
http://slashdot.org/comments.pl?sid=2556922&cid=38249652
IT should have unions so they are not the fail guy for management mess up's / lack of funds and or planing.
There was some Silicon Valley ISP whose name unfortunately escapes me just now, that had the "problem" that its service had grown so popular that the time required to search for a mailbox in /var/spool/mail was greater than the time duration between incoming mails. The result was that their system worked great right up until a certain critical threshhold, then all of a sudden most of their users' mail started to bounce.
Their solution was to place user mail spools in their home directories rather than all in one directory, that being /var/spool/mail. Because the home directories weren't all in the same parent directory - that is, not all in /home - rather than a linear search, finding the right spool became a much quicker tree search.
If you have a large number of users, even if you have only one filesystem for home directories, you can speed access to individual user files by placing, say, my "mike" home directory in /home/m/mi/mike, rather than just /home/mike.
IT people need to move into management at a more useful rate. Instead most of the people who ultimately make the financial decisions for IT centers around the world have little grounding in IT and hence limited understanding of what is actually important beyond the bottom line.
Of course, this requires IT people who are willing to put their foot down. We don't seem to have many of those...
Damn_registrars has no butt-hole. Damn_registrars has no use for a butt-hole.
IT goes to management and says "based on current usage/loadings etc the system will fail in 6 months to prevent it we need to do this....." Management says "Really, that's not what the sales man told me and its his equipment so he should know".
Undetectable Steganography? Yep, there's an app fo
Now I have an email addresses through hotmail, gmail and yahoo that I use for different things and facebook also gives me an email address. So, I doubt students really need email addresses provided by the university anymore.
You are quite wrong. Email addresses - especially .edu addresses - are still quite valuable. At lot of academic resources that take registration via email won't allow registration to go to a throwaway account (a la hotmail, gmail, yahoo, etc). Many organizations that are interested in real information on users insist that users use an actual unique account and not a freebie. And when you're in college and making very little money a lot of those things can be important.
I think it just shows that trying to build IT competence into a government agency basically a waste of money because the institutional culture of government
You're not very accurate on that, either. Government organizations need to be able to keep track of their email - especially internal communications - which they would not be able to do if they outsourced email and other telecom.
In short, all of these kinds of organizations could just offer email through gmail/google business or any number of other providers that will scale up almost infinitely.
With the various privacy breeches that have occurred, that would be a terrible idea. And on top of that, IT is a lot more than just email. Do you want the government to turn to comcast for networking support while their at it? What if the IRS web servers go down on tax day? Do you want them to have to lean on an outside company to get it back up?
Damn_registrars has no butt-hole. Damn_registrars has no use for a butt-hole.
At the school where I teach, whenever there's a discussion of how much it costs us to run our own email, someone suggests outsourcing (e.g., to gmail), and then someone else says, "No, we can't do that because of privacy laws." Am I right in guessing that privacy laws don't in fact prevent outsourcing to google? I suspect the argument is basically a way for IT folks to have job security. There are certainly laws that say, e.g., that we can't give students' grades to third parties. But it's hard to believe that letting google keyword-index emails and serve ads based on the keywords would violate these laws. (Whether google creeps you out is a different issue -- a moral/political one, not a legal one. It may also be an issue, but it's not an issue that can automatically end the discussion the way the legal issue can.) Does anyone know of any colleges or universities that do outsource to google or someone else?
Find free books.
Seriously, is Berkeley like the only college campus that hasn't outsourced their e-mail to Google yet?
Only 70000 accounts? That's not a big system at all. I was running systems with over million email accounts ten years ago, and by today's standards even those would be considered small.
worldmobilenet.com -- World Prepaid Wireless Internet plans
In the video, they don't even mention sendmail at all. Are they using it?
Also, they mention that the cost of the system is something like $1.30 per account per month. I don't know much about IT budgeting, but that seems like a really low number for something as critical as messaging and calendaring. I have to imagine that they spend more money per user just cutting the grass around the campus.
There aint no pancake so thin it doesn't have two sides.
it's like saying IT can do heart surgery or IT can provide pscyhological counseling to a trauma survivor. IT is IT, it is not management and it is not leadership. IT is IT.
of course, shit rolls downhill, and leaders nowdays are incompetent buffoons who gain their positions largely through bribery, kickbacks, extortion, and other 'features' endemic to societies where the rule-of-law breaks down thanks to a greedy, corrupt elite.
again, IT cannot fix that.
I've only heard from people on one side of this but the story that I hear is that in the past, many departments had their own IT, mail servers, web, etc. When the campus built its centralized computing services facility, there was great pressure on departments to move to the central system. There was some griping about the costs for central services often exceeding the internal costs the departments formerly had but there was, I'm told, much need to justify the expense of and to pay for the new center. I've heard that some departments have been able to resurrect their internal systems to get through the outage.
Perhaps someone with more inside knowledge than I have can fill in and/or correct information from both sides of the story.
That slideshow is pure management-spin right from the opening "look how complicated and difficult this is..." I love how the "solution" to a system that is soon to outstrip its capacity is to stop expanding (and, it appears, properly maintaining) said system and hope it doesn't implode before you can toss the potato to an external party (who can then take the blame). Guess I was never learned at that school of capacity "planning".
~~~~~~~
"You are not remembered for doing what is expected of you." - Atul Chitnis
The press pretty much reads like this to me:
1) We didn't size the system large enough to handle the possible outages.
2) The outage we didn't size for happened, basically taking everything down.
3) My team is now working on a band-aid solution, which basically involves hobbling the application.
4) Since we're incompetent, we're going to outsource this next year.
I mean, if I was the CIO's boss I would have fired him on the spot. Maybe outsourcing is a better answer than putting in place a proper system and looking at that analysis could be interesting. I see no indication any of that was done here, basically the CIO gave the Barbie response, "Mail is hard, let's go shopping." If he doesn't understand how to do it in house, he won't understand how to arrive at a good outsourcing agreement.
Which means this pretty much sums up everything that is wrong with large org IT today.
All comments so far are like "it's not sendmail's fault".
Why is everybody so defensive about it?
... not treating a non-technical problem as a technical problem? Identify the problem, write the memo, keep the flimsies, and drop it on the relevant manager's trouble ticket queue. Or however the flow goes in your locality. The rest I leave as an exercise for the reader.
Google have 24x7 phone support now. It is really a futile exercise to maintain local email systems even for a few thousand users, it will be outsourced sooner or later.
They left out the slide where management get great big bonuses for being such swell thinkers.
The world's burning. Moped Jesus spotted on I50. Details at 11.
Due to FEDERAL LAW communications from school staff to students (and the reverse) must go to University Email accounts.
plus if somebody "does not get" a given email then its the schools fault.
Any person using FTFY or editing my postings agrees to a US$50.00 charge
Clearly email is an afterthought thrown in for free.
If you want a service to work, you have to fund it. You can try to fight for budgets against the football team or you can simply charge and the money automatically goes where it's needed.
Think of money as little packets of information. You buy something there is a need for it, you don't buy it, there is no need. Resource allocation without dozens of layers of management.
Maybe nobody cares about email and they can just shut it down. Charge for it and find out.
Deleted
Look up Microsoft live@edu and Google aps for education.
Wow, Squirrelmail. So at least they managed to migrate from pine at some point.
Yeah, they're planning the upgrade from squirrel to carrier pigeon as we speak!
These posts express my own personal views, not those of my employer
I hate it when people try to act as if IT isn't subject to budget constraints and having to prioritize spending like any other department of a large organization. Sure the money comes out of the "client" departments, but it's an issue that IT does have to plan for and deal with.
The summary asks "How can IT people solve things like this?"
Forward the emails and responses to the demands for planned capacity growth to the public.
Oh, you didn't keep the email from your manager refusing to pay for a needed capacity upgrade? I guess you haven't been in IT long enough to learn to cover your own butt.
I do not fail; I succeed at finding out what does not work.
one of the attributes of online shopping is the fact that MLB jerseys they will offer cheaper Wholesale NFL jerseys costs the manufacturer's products. it will be some benefits to NBA jerseys accomplish a continued reduction of the individual models taking in to account people also make cheap nfl jerseys for special offers. with a maximum of branded stuff, it is feasible that in NHL jerseys no way go wrong about the quality.
Sigh...
Look at the first bullet point of the timeline. Productivity suite approved, upgrade to Calmail cancelled. Then a week ago, they decided on an interim upgrade because not upgrading in the first place caused problems. So, rather than a planned upgrade, the IT folks were thrown into panic mode because their (probable) proposed timeline for safely doing an upgrade, including burning in and testing of new hardware, was cut to a fraction of what it should've been.
You can argue about the budgets, or the IT folks, but this is a failure of management. If (in Spring 2011) they cancelled the upgrade, and then had to have an emergency upgrade, what you have is management that fundamentally does not understand the system. This would (probably) not be the IT folks managing the system, but rather the budget and personnel management that doesn't quite grok how upgrades should be done in a safe and controlled manner. They misjudged the initial cancellation, and then (likely) pushed through a poorly planned emergency upgrade.
If the slides are correct, there is very little having to do with a failure from a technical aspect, and everything to do with a breakdown of management.
As IT personnel, we could demand that no shiny new toys be allowed to talk to the nerd stuff in the back until there is 2x the amount of nerd stuff in the "back" as would be needed to effectively handle all the new shiny things.
Of course this would equate to a bunch of loathsome, Grinch-like, bureaucrats needlessly handicapping the business/school/non-profit to serve there own petty need for authority.
All we want to do is hook all these toys to the e-mail cloud. Why do we even need to talk to you people?
Big scarf and replica luxury handbags with the whole body is the feeling is manifested
The whole body black if not highlight window, although good-looking but is often neglected oh. This body has his own style dress up, after all, reminiscent of the handsome man style. Don't redundant design, simple is the best choice of the character of the foil, big scarf and the top hat with the body feels the most prominent reflect, of course, if you can't control the whole body of the black, a pair of bright shoes is a good choice. Winter just over the horizon, farewell frivolous dress, meet the winter chill, a nice coat is cannot little. Europe and the atmosphere of the model can be instantaneous promotion noted long coat gas is in fashion, each girl can have attraction man glamour, all kinds of actress all kinds of street snap, imitating Europe and America style, this year is the popular contracted fashion. The replica Louis Vuitton MINI HL is a nice handbag , this replica GUCCI handbags is hand-held with its rounded leather handles , the replica Louis Vuitton handbags Mini HL handbag is ideally scaled for keys , make-up and cell phone. Its outside is available in Monogram canvas , it makes the perfect partner to the Speedy bag , there are rounded leather handles , you could carry this replica Louis Vuitton handbags on hand , it would be a nice handbag for this style.
Naked color shawls and kids clothes
Hot mama beauty seems to be in time is, of course, wear the clothes to also be quite its own set of rules. Naked color continue to brilliant popular trend, the girls don't have to worry about that last year would be outdated, hot mama of this idea is that have come out of the natural and graceful feeling shawl, but match a agile dish hair, and handsome sunglasses, integral temperament have youth quite a lot. Winter just over the horizon, farewell frivolous dress, meet the winter chill, a nice coat is cannot little. Europe and the atmosphere of the model can be instantaneous promotion noted long coat gas is in fashion, each girl can have attraction man glamour, all kinds of actress all kinds of street snap, imitating Europe and America style, this year is the popular contracted fashion. Not only are girls, fashion, and some children are very trend, and their mother always dressed them very eye-catching, kids clothes tend to be personalized and is now international, in the streets of Europe and America, we often see mothers with beautiful children, beautiful children's clothing they wear, it can be said to lead the world of children's fashion model. I think that the baby clothes is becoming mothers increasingly important consideration.
First and foremost, it has to account for budget, and the rationalization thereof. It's scary how often suits (and more and more "engineers") say things like "Come on really; how often does that kind of thing actually happen?" This is usually uttered after staring at a couple dozen slides of metrics that detail exactly how often it happens...
Shift happens. Fire it up.
Has anybody thought of using Nginx e-mail proxy to solve the issue?
http://nginx.org/en/#mail_proxy_server_features
University IT departments providing email and calendaring services is like the facilities employees being required to build all the chairs for classrooms.
"If I can't dance, I don't want to be part of your revolution" - Emma Goldman
I thought nobody (especially college kids) used email anymore. Facebook is where all the cool kids hang these days, right? California is doing a bang up job alright. When they're done with this project maybe they can consult themselves out to the Feds. I hear they've got a mail problem of their own with this Post Office thing. For those of you unfamiliar with Post Offices, the wikipedias have a decent write up: http://en.wikipedia.org/wiki/Post_office. Anyway, I can't wait to see the powerpoint slides for the Post Office Turn Around Medical Marijuana Home Delivery Program.
There's a more mundane problem. Unless you are an incredibly huge customer the large service providers are just not going to care if there is an outage. One example I ran into last year is a University of 45,000+ students that lost their student email hosting (hotmail) for a week due to a DNS typo for a machine in a hotmail MS Exchange server farm. To get a job offer to a student I had to put an entry in /etc/hosts of my mail server - meanwhile no other mail was getting to any of the students for a week.
That's the price of outsourcing. Your important services are farmed out to people that just do not care enough to fix a typo for a week.
You don't even have to go as far as malice when apathy is enough to provide unacceptable problems.
I think the mail gods are angry at the academic community. Our mail server crashed as well last week. Took them a 4 days to get a backup server online. And another week to get the emails to the new server.
Watching the video, two things were apparent: Sendmail wasn't being used--Exim was, and the fault was not the MTA, but rather the use of a single SAN backend for everything.
I've been in the Messaging Infrastructure business for many years. The UC problem is poor design. They left themselves open to a single point of failure by not splitting the mailbox load across multiple SANs. Their load isn't really all that great--I've designed for much larger email volumes. What they need is an LDAP-based routing (or similar) mechanism to send different recipients' emails to different SAN backend stores--say, alphabetically by last name, or by entry (employee/student/alumnus/account) id. When a disk failure occurs, it then would only affect a small percentage of the population, and for a much shorter time. By enforcing RFC compliance on the front end, they would also reduce the load on the back end, and could easily handle their traffic load with far fewer servers--thereby costing far less than what they currently have.
They certainly can pay someone else to do proper design, of course, but they should understand that technology and budget did not cause their problem, their poor design did.
-David Gillam
www.davegillam.com
I don't know a whole lot about this but I'm on the mailing list for a department that was in the process of migrating to Calmail. (My email goes through a different system so it hasn't affected me.) After a slew of messages this past week about Calmail problems, they've decided to cancel the migration for now. Apparently Calmail is going to the cloud in the future, so they're hoping the existing servers last until then.
a cascade failure.
1. data storage failure.
2. database crash, presumably due to the fault in data storage
3. heavy backlog of deferred mail begins hammering a generally neglected piece of Berkeley IT.
email, calendaring, and instant messaging arent mythical, and they need constant competent care
just like any part of the IT infrastructure. Having worked on complex email systems for the better half of my career
some of the fault lies with the berkeley teams "set it and forget it" mentality as shel waggener scolds the audience about at the start of the video.
is that backend database known to the DBA and in a healthy state? are the front-end components configured for email as it ran in 1993 or have they
over time been upgraded with new features to address email as it operates in the 21st century.
Good people go to bed earlier.
Here it is in a nutshell:
UC Management had decided to delay the purchase of a replacement system a year ago. UCB has been working on OE (Operational Excellence - euphemism for downsizing) for a few years now. Because of the decision, one of the main techs that help created the stable, top notch mail system left for Twitter a few months ago. Someone on joked on the Micronet mailing list that mail was going to crash and burn.
The university decided at the beginning of the year not to replace the aging, out of warranty hardware while they evaluated outsourcing to gmail or Microsoft.
Mail services were already experiencing problems during the past few months prior to the crash. This was due, according to the CIO, to the unprecedented amount of users using cell phones & iPad to access mail constantly starting this semester. Then a critical server crashed the day after Thanksgiving. The techs brought it back up over the weekend, but they were still recovering data.
Then on Monday, everyone came back to work and when the staff all started to log in, the system degraded so much they had to take it down again. They disabled imap & pop services to keep all the cell phones and ipads off the servers, which also destroyed the productivity of the staff that relied on Thunderbird and the Mac mail client to access imap. The web based squirrel mail and roundcube clients were unfamiliar and lacking functionality of real mail clients. Even the techs were figuring it out with each other on the Micronet mail list.
Basically, it was a management decision. It was a gamble that failed.
Capcha = raided