Slashdot Mirror


'Server, Heal Thyself,' Says IBM

quakeaddict writes: "I guess it was inevitable. According to this story IBM is spending 25% of their considerable R&D budget to build self healing servers. One memorable quote: "Most important, Wladawsky-Berger said, the machines will be so simple that they will be no more difficult to operate than a kitchen appliance. That should reduce the need for highly skilled workers who are in increasingly short supply." I hope I can make enough for early retirement!" Of course, "IBM plans to develop failproof servers" is a bit like "Ford Plans to develop fuel-sipping flying cars," but the more intelligence built into machines, perhaps the better overall.

21 of 122 comments (clear)

  1. This is one of those 1950-type dreams by Anonymous Coward · · Score: 5

    "In the year 2000, we can go to the Moon for a holiday"

    I'm not saying they couldn't make self-healing servers, but you can't dismiss skillful people, ever. You always need skillful people somewhere. If the servers are so easy to use that you don't need skillful workers, then you don't need to pay for IBM support either.

  2. We'll all be out of jobs! by toofast · · Score: 5


    There will be major IT staff layoffs! We won't need paper anymore! By the year 2000 we'll only work 2 days a week! We'll do our groceries from home!

    Oh, wait...

  3. This should work out for IBM.. by antiher0 · · Score: 5

    to quote a good friend of mine:

    "When functionality and reliability are sacrificed to the gods of idiocy, the gods of economy smile."
    --Todd C. Williams

    in all actuality, this is what will come to pass. it will be "easier" for PHB's, but at the loss of function and stability. however, it will probably pay off well for big blue.

  4. highly technical by bungalow · · Score: 5

    Don't bother hitting the article. Read below for all the details readers are getting:

    IBM will devote 25 percent of its research and development budget for synergy synergy synergy synergy synergy synergy synergy synergy synergy synergy synergy synergy synergy synergy synergy synergy synergy synergy synergy synergy synergy synergy synergy synergy synergy synergy synergy synergy IBM.

    Irving Wladawsky-Berger thinks this is a really important problem. Not synergy. Irving thinks we need more synergy.

    "In that effort, we are COMMUNICATING that, going forward, we will PRACTIVELY Leverage our SYNERGIES, Keeping the End in Mind, whilst Sharpening our Saws and putting First Things First. Even though the last word I just said was said was first. ", said Mr. Wladawsky-Berger

    1. Re:highly technical by milo_Gwalthny · · Score: 5
      Exactly. Ever notice that when you add up all the percentages of IBM R&D spending the total is about 437%?

      Kind of like how IBM's revenue was (what was the exact number?) like 25% e-business related just a year ago... so, is it still? or was that just hype? Or like Gerstner's comment on how IBM's Wal-Mart.com site was so great that it was going to put all the other e-retailers out of business (no, I don't think that's why they're going out of business.)

      Who does their PR for them? Zippy?

      --
      Milo
  5. the project's named eLiza?? by eries · · Score: 5

    I can just see it now:

    User: "Server, heal thyself!"

    eLiza: "Tell me more about your mother"

  6. Great! (if it works) by duplicate-nickname · · Score: 5
    I know most Slashdot readers despise "smart" software, and would rather manually configure/tweak/diagnose everything themselves. From my years of managing networks and servers, I have come to the conclusion that the more intelligent the hardware and software, the better. I have better things to do with my time than make sure applications are running properly and the hardware is performing as it should. That may sound like the rant of a lazy-ass admin who would rather read /. all day than work, but here's my reasoning:
    1. I don't work 24/7, if the software can fix itself that means a faster response time when I'm not around.
    2. Why should I spend my time fixing hardware problems when it's under a support contract? Have the server call IBM/Dell/Compaq/etc, and let them fix it. Why buy 4hr contracts if our personnel end up fixing the problems?
    3. In the long run, it is best for my company than I spend my time planning for the future and testing new technologies rather than fixing things that have already been implemented.

    I bet most administrators out there would agree with me on this. The only problem I see with IBM's plan is that what if their self-healing platform is so large and complex, that it takes even more of your time to manage it than you would normally spend monitoring/fixing your servers? Worse yet, the possibly that their software introduces more bugs into the systems (anyone remember those great Windows uninstaller apps like Cleansweep?). It will be interesting to see what IBM can come up with....

    ÕÕ

    --

    ÕÕ

  7. Re:The more "failproof" technology out there... by gargle · · Score: 5

    I see things like this as an INCREASE in job security, not the other way around.

    How many toaster operators do you know?

  8. Server != Network by dingbat_hp · · Score: 5

    What is really of interest here is keeping the "server farm" working. This is the point where little Unix boxes and Big Blue iron start to look completely different.

    In a BSD / Linux shop, you don't worry too much if one box gets hosed. You unplug it, load balance onto a bunch of other identical boxes, and plug in a new one fresh from CheapClones 'R Us. Later on, you either wipe and rebuild the hosed OS, or you throw away the smoking hardware and order some more, depending on whether you suffered H4XX0Rs or lightning. The big issue is keeping the network of lots of boxes secure and functioning.

    In Big Blue's world, there's just the one server. There's only ever one server, because no matter how many city blocks it spreads over, the thing still feels like a single box. The power of their mainframe approach and OS is that it can feel like a single box, and it can feel like that to a whole load of people simultaneously. In this case "the network isn't the computer", but the computer is the computer. This changes the rules - networking becomes simpler, because there just isn't so much of it (that looks like it's "between independent boxes" anyway). OTOH, a server intrusion is far, far worse than it would ever be in the Unix world -- which is why the mainframe security guys are even more cautious than the rest of us.

  9. The more "failproof" technology out there... by jcapell · · Score: 5

    ...the fewer technically skilled people there are available to handle the inevitable failures.

    I've already seen an huge increase in demand from clients that buy turn-key solutions, and then need technical help with problems that arise.

    I see things like this as an INCREASE in job security, not the other way around.

  10. Just feeding the fire by rneches · · Score: 5
    Well, self-healing servers. That's pretty cool for us geeks, actually. That means that instead of mucking around with annoying config problems and babysitting retarded servers, we can actyally (gasp!) get some work done. Like, you know, running a company's IS infostructure.

    There have been a lot of movements in history to prevent cetrain types of automation and increased reliability from going to market on the fear that it would hurt people's jobs. In reality, these inovations tend to make things better, not worse for those whose jobs use them. For instance, it used to be that railroad cars had to be linked by hand, resulting in tens of thousands of maimings and deaths. When automatic couplers were invented, there was an outcry that it would destroy jobs. In the end, it turned out that automatic couplers still needed people to work right, but the people didn't have to get between the railcars. Injuries and deaths decreased, and there was more cheese for all.

    Honestly, I think fixing broken computers is the least appealing part of being a computer person. Very, very rarely, a problem comes up that is complicated enough to warrent some actual thinking but not self-defeating enough to be infuriating. I'm sure that IBM's self-healing servers will have quirks of their own, and hopefully fixing them will be more interesting that the usual "hey, look - the NT server crashed again. Oh, look - no more registry. When can we replace this thing with a real server? OK, sure, boss, it is a real server if you say so. Where's the install CD?"

    --

    --
    In spite of the suggestions and all the tests that I have made, I have not cavato a spider from the hole.
  11. Hard To See It by Caraig · · Score: 5
    Novell already tried something to this effect from the NOS side of the house, using ZENworks. (Zero-Effort Networking.) The concept was that they would make the MIS dept's job easier, and reduce the need for IT staff, but putting more control into the hands of the MIS staff as to what goes on with the computers on the network.

    ZENworks is a bear to deal with, and it is not the most pleasant of things to manage. Some great toys are in it, but all those toys need configuration and management. So, while potentially it could reduce the work of the MIS staff, in actuality it redistributes that work. More of the onus is on the administrator rather than the tier 1 guys or footsloggers who go out to the actual machines.

    I cannot imagine that IBM's self-healing servers are any different. We've seen time and time again that computers really can't find out what's wrong with themselves. Part of this is because the operating systems themselves are incapable of covering all that can happen; hardware these days is remarkably stable, for the most part. In order to have self-healing servers, the OS and the server will have to be very tightly-knit, and there will have to be a way for the OS to understand what a "General Protection Fault in module INSANITY.EXE at 6F7D8E:7D33F" is, but also (1) what caused it (MS bashing aside, and remembering that any OS can be host to a GPF, GPFs do not occur in a vacuum) and (2) how to remedy it or work around it.

    At the very least, for GPFs, this will require a very sophisticated memory manager which can reallocate memory used by programs, remembering that giving a program access to memory in such-and-such a location caused a GPF last time and so it will need to put that code in a different location.

    What I see happening, is that IBM will do a decent job of these self-healing servers. Their complexity will neccessitate charging exorbitant fees for any problem you call in, and massive monthly maintenance fees (the "just in case" cost, which any smart company pays, diligently, and on time; and which dumb companies withhold, shy away from, and simply not pay. I have seen this. It causes mondo problems, and guess who gets told to "resolve it?" (One guess: *hack*MIS*cough*.) You will not see a reduction in the number or cost of footsloggers or tier 1 helpdesk people in your company, and you will ALWAYS need a certified, professional, experienced network admin at the helm.

    This will not significantly reduce the need for an IT staff unless, of course, Google with their 8000 servers switch all of them over to self-healing servers.

    ---
    Chief Technician, Helpdesk at the End of the World

    --
    "I am an Adept of Tantric VAX."
  12. KISS is what works. by Big+Torque · · Score: 5

    The smarter you make things the harder it is to make them full proof. You really want a system reconfiguring it self and making network decisions with out you knowing about it until after the fact if at all? The most reliable systems are the ones built by System and Network engineers that know what they are doing and make the systems as static as possible. If up time is everything and it is where I work keeping it simple having it work and having it not change is what works.

  13. Sweet by Ndog · · Score: 5

    They're trying to make servers behave like kitchen appliances. What a great idea! We'll be able to have the Sears guy fix our servers once this gets implemented.

    Meanwhile, I'll be moving into a new field, since kitchen appliances, game consoles, and other household items will soon become computers.

    Please. If this happens, it will be great, and it will certainly not mean fewer jobs for tech workers or fewer problems for the people using the servers. It will just cause different problems, resulting in different solutions. When has making things more complicated made them less costly to maintain?

    And that is the issue. The article mentions the unavailability of skilled tech workers and cost as the two main reasons for this. There is no shortage of tech workers, though. There is a a shortage of low cost tech workers, and most of these people obviously could not be expected to have high end skills. That is why the big companies wanted the raise in the H1B limit.

    If IBM succeeds at this, I don't see their customers saving much, if any, money on it. Customers will just end up paying IBM a larger share of the money they spend on technology than customers have previously. I also don't see any problems for technology workers. More technology means more jobs. Sure, we have to adapt, but you shouldn't be in the technology field if you can't learn new things. Technology will continue to proliferate, as will the need for people to help keep it all working.

    --
    -N
  14. Re:Somebody give me 25% of IBM's R&D budget. by RexxFiend · · Score: 5

    you are too late - its called a cluster and is only slightly more reliable than a normal standalone box. The problem is that you need to hook up both boxes to a storage system (like a SAN) which then becomes your single point of failure again. You can mirror and cluster the SANs of course but your costs just keep spiraling upwards.
    The other problem is that unreliable software running on your cluster is likely to toast all the boxes on the cluster, rendering it useless.

    A crash reduces
    Your expensive computer

    --

    A crash reduces
    Your expensive computer
    to a simple stone.
  15. Redundancy and Simplicity. by gus+goose · · Score: 5

    The story suggests that the healing solution is through redundancy. This is a relatively easy method for "self healing". Mainframes have done it for years, the issue is to get the OS to understand what's happening.

    The benefits are that when something breaks, you can be alerted to it, and fix it without downtime. The drawback is that you need to buy at least two complete computers for every functioning one.

    The article is misleading in a way, because it suggests that the computers really are self "healing", yet, the article says "backup systems that kick in whenever the server senses a problem". Who to believe?

    If you want uptime, get redundancy. Compaq is even offering hotswap PCI cards now. Things are really cool. All we need is the operating system to be as advanced as the hardware. It does not help when your NT machine needs to reboot when you change the IP address.

    --
    .. if only.
  16. Darn... by isa-kuruption · · Score: 5

    ... just when I started to enjoy driving 40 minutes to work to diagnose a hardware fault, calling the vendor for a replacement hard drive and then swapping it out... all on a sunday morning after a keg party. WHAT IS THIS WORLD COMING TO?!

    -*-*-*-*-*-*-*-*-*-*-*-
    w00t w00t raise da r00f!

  17. Great,, more idiots in IT. by SumDeusExMachina · · Score: 5
    That should reduce the need for highly skilled workers who are in increasingly short supply.

    Great, that's just what we need is more unskilled tech workers trying to maintain server "appliances." Don't we already have enough problems with things such as security without having more idiots behind the wheel, with IBM telling companys that those idiots are perfectly qualified for the job?

    I can't say that I like this one bit. Not to mention the fact that they are trying to put more Slashdotters out of work by replacing them with "dumb appliances." I'm sure everyone's going to love losing their job.

    --

    Is your company running tools written by ma
  18. mmmm... by Skoozler · · Score: 5

    Why just have self-healing servers? I would really like it if, when my box crashes, it reboots and fscks, if need be attracts my backup tapes to the DAT drive, restores the system, reconnects to its fellow servers and prints/pages/phones me a groveling apology.

    Well, I wish anyway. But it would really be useful: although it would result in a loss of technical sector jobs, it would be cool never to have to reboot our machines manually.

    --

    --

    --
    bash: help: Don't be so weak.
  19. My Kitchen... by Ryan_Terry · · Score: 5

    The last time my kitchen stove went out I had to get a repairman to come visit and it took him 3 days to figure out what was causing the issue. If it's all the same I'd like to make sure my servers _don't_ run like my kitchen appliances.
    But thanks for the offer....
    DocWatson

    --
    MessEdUp
    .sig
    #/var/www/v
  20. Self Healing Servers by zoombah · · Score: 5

    What about physical threats? Starting next week I want my IBM Servers built with Terminator-2 Style advanced steel. My servers also need to choose and manage their software by themselves. They also need to communicate with my various clients, manage business proposals, and drag my lazy ass off slashdot when I'm supposed to be working.