Slashdot Mirror


Patch the Linux Kernel Without Reboots

evanbro writes "ZDNet is reporting on ksplice, a system for applying patches to the Linux kernel without rebooting. ksplice requires no kernel modifications, just the source, the config files, and a patch. Author Jeff Arnold discusses the system in a technical overview paper (PDF). Ted Ts'o comments, 'Users in the carrier grade linux space have been clamoring for this for a while. If you are a carrier in telephony and don't want downtime, this stuff is pure gold.'" Update: 04/24 10:04 GMT by KD : Tomasz Chmielewsk writes on LKML that the idea seems to be patented by Microsoft.

10 of 286 comments (clear)

  1. Needed that bad? by MetalliQaZ · · Score: 5, Insightful

    If you are a carrier in telephony, you should have many load-balanced servers that can be taken offline one at a time and restored after patching. They probably would be taken out of the loop for the in-place patching anyway. So who is "clamoring"?

    --
    "Here Lies Philip J. Fry, named for his uncle, to carry on his spirit"
    1. Re:Needed that bad? by jelle · · Score: 5, Insightful

      So you take it out of rotation on the load balancer and give it a few minutes to complete all its active connections. Patch/reboot whatever. Bring it back into rotation, and repeat with the other box.

      Methods like that usually suck in real-life, because right the day before you want to 'take it out of rotation', a circuit is opened through it that requires five nines (so you can't drop it), and it will remain open for months...

      You will end up with 99 boxes waiting to 'get out of rotation' for every
      single box that you don't need to update...

      Murphy will make sure of that.

      --
      --- Hindsight is 20/20, but walking backwards is not the answer.
    2. Re:Needed that bad? by Anonymous Coward · · Score: 5, Insightful

      I have internal processing servers that have up times of over 3 years

      I've never understood this boasting about uptime. Long uptimes are a bad thing! How do you know a configuration change hasn't rendered one of your startup scripts ineffective? If you have to reboot for some unexpected reason, you could be stuck debugging unrelated problems at very inopportune moments.

      You need to schedule regular reboots so that you can test that your servers can start up fine at a moment's notice. Long uptimes are a sign a sysadmin hasn't been doing his job.

    3. Re:Needed that bad? by Kookus · · Score: 5, Insightful

      Production systems are not for testing purposes. You want to test rebooting? Do it on a test box.

  2. Re:Wrong way to solve the uptime problem by Qzukk · · Score: 5, Funny

    Trust me, that was the first thing they thought of, then the CEO came in and said "Why are you ordering more equipment when we have half of our machines sitting there and doing nothing? We could be doing twice the work/traffic/whatever without paying more money!"

    --
    If I have been able to see further than others, it is because I bought a pair of binoculars.
  3. Re:Maybe... by CogDissident · · Score: 5, Funny

    I thought their working slogan was:

    Windows 7, it's not awful like Vista!

  4. Re:Amazing by katz · · Score: 5, Funny

    Considering that you don't need to prepare the kernel in any way--just execute the program and bang, it's patched--means that someone with root access could slip a rootkit right under your nose (i.e., without the system administrator being aware of this).

    - Roey

  5. Re:Amazing by KeithJM · · Score: 5, Insightful

    someone with root access could slip a rootkit right under your nose Yeah, someone with root access can take control of your server. Oh, wait, they've got root access. They already have control of your server. At some point, you have to just accept that giving someone root access is a security risk.
  6. No, No, No and No again. by Anonymous Coward · · Score: 5, Interesting

    As an admin for some -very- high availability systems, load balancers are not a silver bullet. This solution would most apply for running one-node clusters who are using a single machine as a perimeter network device. (ex. firewall) I see lots of these in the racks at our NOC provider.

    1. We connect to several load balanced systems and the complexity introduced by load balancers translates to inexplicable down time. No load balancers means a pretty steady diet of the latest and greatest server hardware, but no down time. The a few minutes of down time costs more than the server hardware.

    2. High availability translates more roughly into nodes that can fail (ex. power off) and not take the cluster down. This boils down to active-passive application architecture more than just using heartbeat.

    As an FYI, PostgreSQL clustering is a killer application for me. Erlang is also great in many ways, but requires application architecture with active-passive node awareness. Which isn't present in things like Yaws, or even my other favorite non-erlang app nginx. Heartbeat is the solution there, but I'd like to see yaws be cluster aware on its own. http://yaws.hyber.org/

  7. Re:In Soviet Russia, by oodaloop · · Score: 5, Funny

    "But does it run linux?" That's a joke? I thought that was just one dedicated user who kept asking on every article.
    --
    Tic-Tac-Toe, Global Thermonuclear War, and relationships all have the same winning move.