Slashdot Mirror


Email Offline At the Home of Sendmail

BobJacobsen writes "The UC Berkeley email system has been either offline, or only providing limited access, for more than a week. How can the place where sendmail originated fall so far? The campus CIO gave an internal seminar (video, slides) where he discussed the incident, the response, and some of the history. Briefly, the growth of email clients was going to overwhelm the system eventually, but the crisis was advanced when a disk failure required a restart after some time offline. Not discussed is the long series of failures to identify and implement the replacement system (1, 2, 3, 4). Like the New York City Dept. of Education problem discussed yesterday, this is a failure of planning and management being discussed as a problem with (inflexible) technology. How can IT people solve things like this?"

8 of 179 comments (clear)

  1. Re:It isn't an I.T. problem by StikyPad · · Score: 5, Insightful

    Pretty sure that's what tuition is.

  2. Re:Telnet by slimjim8094 · · Score: 5, Insightful

    Students need school email addresses because that way all students have an email address.

    At my school, students are expected to check their university email at least once every 24 hours. Many people forward it to a personal account, and obviously most people check it more frequently than that, but if the university issues an account to everyone, then there can be no debate about how they didn't get the email. The school takes responsibility for the email system (and any failures), and then professors can be assured that if they send an email out to the class, it will be (or should have been) read, leaving the onus on the student to actually do it. It's similar to why we provide computer labs - that way, each student unequivocally has a way to do electronic assignments, even if nearly everyone has their own machine.

    --
    I have developed a truly marvelous proof of this comment, which this signature is too narrow to contain.
  3. IT is not the Problem by arthurpaliden · · Score: 4, Insightful

    IT goes to management and says "based on current usage/loadings etc the system will fail in 6 months to prevent it we need to do this....." Management says "Really, that's not what the sales man told me and its his equipment so he should know".

  4. No. by damn_registrars · · Score: 5, Insightful

    Now I have an email addresses through hotmail, gmail and yahoo that I use for different things and facebook also gives me an email address. So, I doubt students really need email addresses provided by the university anymore.

    You are quite wrong. Email addresses - especially .edu addresses - are still quite valuable. At lot of academic resources that take registration via email won't allow registration to go to a throwaway account (a la hotmail, gmail, yahoo, etc). Many organizations that are interested in real information on users insist that users use an actual unique account and not a freebie. And when you're in college and making very little money a lot of those things can be important.

    I think it just shows that trying to build IT competence into a government agency basically a waste of money because the institutional culture of government

    You're not very accurate on that, either. Government organizations need to be able to keep track of their email - especially internal communications - which they would not be able to do if they outsourced email and other telecom.

    In short, all of these kinds of organizations could just offer email through gmail/google business or any number of other providers that will scale up almost infinitely.

    With the various privacy breeches that have occurred, that would be a terrible idea. And on top of that, IT is a lot more than just email. Do you want the government to turn to comcast for networking support while their at it? What if the IRS web servers go down on tax day? Do you want them to have to lean on an outside company to get it back up?

    --
    Damn_registrars has no butt-hole. Damn_registrars has no use for a butt-hole.
  5. Re:Nothing to do with Sendmail by vlm · · Score: 4, Funny

    It's the backend. When you have too many connections on too few servers, with not enough storage
    you usually see this kinda issue.

    Knowing the speed and flexibility of university upgrade policies, and knowing sendmail was born around 4.1BSD, and knowing the -BSDs were VAX only until 4.2 or 4.3 or so in the 80s, I'm guessing they're still using the original VAX it was developed on?

    --
    "Science flies us to the moon. Religion flies us into buildings." - Victor Stenger
  6. Re:It isn't an I.T. problem by Anonymous Coward · · Score: 4, Funny

    no I'm pretty sure tuition is more than $10

  7. Re:So the ultimate solution will be outsourcing by lucifuge31337 · · Score: 4, Insightful

    One can have all the clue in the world, yet be powerless to prevent failures if not funded to purchase the appropriate equipment.

    --
    Do not fold, spindle or mutilate.
  8. Re:Hate Being First .... by CAIMLAS · · Score: 4, Insightful

    Believe it or not, maintaining a mail host for a larger, geographically diverse

    If it were easy, there'd be no push to outsource it to "the Cloud" (or anywhere else), and countless organizations wouldn't be moving from the "burden" of administering something like Exchange (ie, a trivial amount of knowledge is required compared to any other MTA) to Office 365 or Google.

    It's not just as simple as setting the mx to point to a 'working host', especially not in academia (though many try). Do you have to deal with this kind of thing?

    As someone who has to deal with this stuff on a daily basis - I had dealings regarding CalMail last week on a similar mail related problem of their's - and with academic mail systems in general, let me clue you in:

    * This is not your business mail system, where everyone has a uniformly specified mailbox.
    * It is not dictated from the top down how mail is run. In a corporation, there is standardization. CalMail is the exception in academia, as far as I can tell, in that it's run somewhat like the business model. However, there is still somewhat of the "Greek" (vs. "Roman") model of management involved, and this does tend to lead to problems. (This is much more true with other academic mail systems, from what I can tell.)
    * Unlike in the work place, there is very little systems experience where it is needed (ie in the actual administration). Even with dedicated IT, very few people are actually good with the mail system due to how broad and complicated mail management can be.
    * Running a mail server effectively is now quite difficult. Not only do you have to "just make it work" - ie, dealing with all the misbehaving mail systems out there from other academic institutions and verifying the VIP email makes it through (regardless of how much spam that means letting through - but never let any spam through!) - but it's got to run like a top.
    * Often, you're dealing with decades of systemic dependencies. Mail was the first connected application, after all, and nobody's had it as long as Berkeley. Based on my own experience with networks which grew around their mail system, small changes can compound any sort of change or update. Suddenly, there's something everywhere that needs a specific mail system functionality which can't simply be copied over during a move to replicate it.
    * An organizational system like this is big, it's not garden variety email. Hell, i guarantee you they don't have as many IT people maintaining accounts as they have admissions people, probably not even a 10th. Yet the IT people have to actually make sure those records get to the right places all while assuring the admissions people that the information transits securely.
    * There is undoubtedly a faculty member with his pet requirements for email. He probably has things which will not migrate properly.
    * There will undoubtedly be the people using their mail account for file storage.
    * Believe it or not, it's actually fairly difficult to migrate mail from, say, Cyrus IMAP to anything else. It takes time (and anything at all with Cyrus, which I'd not be surprised if they were using, takes a lot of time). Sieve scripts, procmail, IMAP states, et al. It's a pain in the ass, and takes a loooong time to do seamlessly. Doing it under duress of hardware failure is something else entirely.

    From my reading of the events (and seeing some other things not mentioned in OP or linked article) there were a number of things which caused this prolonged outage. First and foremost, the system was not designed to be resilient so much as it was designed to scale up (or proper failure condition testing was not performed beforehand). Second, they either don't have the necessary (knowledgeable) human resources, or enough time allocated to those resources, to effectively manage this system. (You would not believe how difficult it is to find a "mail administrator". Everyone's done it, but nobody seems to like it or is all that good at it. If they are, they want a LOT in compensation.) Third, they may

    --
    ~/ssh slashdot.org ssh: connect to host slashdot.org port 22: too many beers