A "Never Reboot" Service For Linux
An anonymous reader writes "Ksplice, the company based on the MIT Ksplice project, is now offering its 'never reboot' service for Red Hat, Debian, and other Linux distros. You subscribe and get real-time kernel security updates that apply in-memory instead of rebooting. Last summer we discussed the free service for Ubuntu. Cool tech, but will people really pay $4 a month for this?"
How long till they get sued by Microsoft?
http://www.google.com/patents?id=cVyWAAAAEBAJ&dq=hotpatching
..an using some Microkernel OS in which something like this would come as a well-controlled feature, we are using a monolithic kernel and self-modifying code?
Stating the obvious, yes, they are.
But third-party companies are under no obligation to offer their products and/or services for free, and this is a service of a third-party company (Ksplice).
If there is a demand for this service, plus an unwillingness to pay Ksplice for it, it's entirely possible (and likely) that someone will come along and offer an open source equivalent. But until the itch is scratched, Ksplice is perfectly within the right to offer the service at a cost.
Immortality baby! Immortality!
UNIX? They're not even circumcised! Savages!
I do tech support at a school. The moment that something goes offline (like our mail server), we start getting calls telling us that things are messed up.
Before anyone asks: Yes, we try our best to only reboot after-hours, and yes, we tell everyone when a service will be down.
Earn a % of cash back from Newegg, Tiger Direct, Walmart.com, and more: http://www.mrrebates.com?refid=458505
Those who do not perform scheduled reboots of their servers do not know whether their servers will come back up properly after unscheduled reboots. How often have you seen someone add a service to a machine which becomes a critical part of your infrastructure then they forget to add it into the RC system?
Color me stupid but wouldn't any application in which you'd rather not be rebooting (i.e. Router, firewall, file server, etc...) be the exact same application in which you'd NEVER want some 3rd party having access to your kernel? I mean, if a large percent of distros were using this I can just imagine it would be the A#1 target for every malicious coder in the world.
Not expensive if the technology works. My time is more valuable and down servers cost money. The cost is paltry in comparison.
You run a server of any kind. In the old days of novell, we had severs with 6 year uptimes. Not possible today simply from patches, not crashes.
This service has the potential to get us closer to that ever distant 100% uptime. It could definately stack another 9 on 99.999
The occasional reboot, under controlled circumstances, is an excellent test of what will happen in an emergency situation. Mainly, it answers the question of whether the server and required services actually will all come back up by themselves.
I've said it before, and I'll say it again:
Just because it's free software, doesn't mean that it's afraid of money.
Kid-proof tablet..
99% of people I've seen bragging about long up-times tend to have perfectly patched and up-to-date OS installations on disk, and a dozen vulnerabilities still loaded into memory. And I'm not talking just about the OS kernel.
If you don't know exactly what an update touches, just reboot.
Because I can’t imagine a easier way to obtain an instant-botnet, than to “spice” such a patch. ;)
By the way: Who came up with remote updates? Why not just compile the kernel locally, like normal people do, and then use a special patching tool?
Any sufficiently advanced intelligence is indistinguishable from stupidity.
"Cool tech, but will people really pay $4 a month for this?"
Depends. If it's your laptop, I suspect the answer is no. If it's your server farm, I suspect the answer is yes.
As an aside: Novell used to run contests to see who had the server with the greatest uptime since its last boot. Best one I ever saw was the Netware server that ran so long that everyone forgot where it was and it was accidentally walled-up inside a closet. Wouldn't it be great if the Linux community could run this type of contest? :)
Regards;
At an individual computer level it's not so bad, but in an enterprise it can be troubling.
A couple of examples: a zero-day exploit of Microsoft Windows (surely this would never happen) requires a patch be applied and the computers rebooted for thousands of users. Even assuming that the reboot can be enforced with 100% reliability (seldom to never), the 1-3 minutes will impact productivity for at least some users. Sure, desktops can be rebooted at night, but laptop users that take their machines with them and never have them powered up unless they are using them will be impacted. Imagine a company with an average productivity value of $10/hr, $20/hr, or $30/hr. Imagine this company has 100 laptop users or 1,000 or 10,000. Multiplication makes that 1-3 minutes each expensive.
A different scenario involving servers where services must be available: say web servers that require database servers and both require directory servers. There may be several of each of these for load balancing or fault tolerance, possibly clusters, and real world examples may be far more complex. Reboots must be coordinated based on which nodes of which clusters can be taken down without impacting service. Often, additional commands must be added to gracefully transfer service, notify a load balancer device, possibly tell a monitoring server that its in scheduled maintenance mode and not to send a bunch of emails to the support team because the server is down. Ideally one web server and one database server and one directory server go down and all come back up, followed by another set, etc, and cluster master roles are reallocated correctly, etc.
Obviously there are ways to script, automate, plan, and mitigate all of this, but if it didn't have to reboot in the first place... that would be nice, huh?
No, they're not.
You see, one radar installation can feed multiple stations, and it's quite common for modern ATCOs to sit at a screen that has feeds from multiple radar sources.
In fact, in the UK we recently pulled out all the old PDPs out of West Drayton and transferred radar control down to Swanwick running on relatively new equipment. I believe this was not done by "clearing the skies" first, they just handed over control to the new guys.
I've heard things about US traffic control being old and antiquated, but I'd hazard a guess to say the vast majority aren't using vacuum tubes, CRTs or the like. I imagine many have converted to electronic paper strip bays for the flight plans too.
For a server running, say, a big web site, or a database, or something else where time is money, and there are a lot of zeros involved, uptime is crucial. When a stock broker's trading floor system goes down, the loss is measured in millions of dollars per second (disclaimer, my brother used to work for a Wall Street firm, his wife used to work for another, and I have two close friends who still work at a third; my estimate is based on things they have told me). Downtime is just not acceptable under some circumstances.
Sure, if my GoDaddy-hosted web site goes off the air for a minute or two while the virtual server gets rekicked, I can't really complain. I end up rebooting my laptop once or twice per week. My desktop gets rebooted maybe twice per year for some hardware update. Users of single-user machines are generally far more tolerant of reboots since, nominally, they are the ones making the decision to reboot. When there are many users, though, rebooting needs to be coordinated, at the very least, so as not to interrupt work in progress. And, as alluded to above, when there's real money involved, sometimes reboots are not ever acceptable.
For you, rebooting might not be evil, but some people do actually depend on high availability of their computers, and some of them are running Linux.
Put my fist through my alarm clock with its ding-dong death inside my ear. - The Blackjacks.
I don't really personally see any use of such service. If you need FT or HA system you need to design it as such from ground up. In this case paying 4 bucks just solves some problems with rebooting after kernel upgrade. I dont have problem with that. I just reboot in next service window. In normal situation mission critical systems have some sort of redundancy not only to cope with planned service reboots but with other unplanned disasters. So usually you have a N+1 redundant cluster in which you can reboot the servers using some procedure that was worked out while DESIGNING the system. Also I see quite few security issues with patching the kernel this way. In mission critical services you usually do test everything before rolling it out to the systems so using such feature just makes things more complicated (that just simply reboot the machine with my current procedures).
I cannot find anything about security details on their webpage. They state "Ksplice Uptrack uses cryptography to authenticate the update feed.". So what? Fedora also used cryptography and once their servers got rooted the whole chain collapsed. So if I was to use their service I wish to know how exactly their security is implemented since I would be getting kernel patches (quite critical stuff) from them. At least with RHEL I know a about their security procedures (quite rigorious). From support point of view. Does f.e. Red Hat or Oracle support systems patched this way?
It is a nice feature but IMO not suitable for enterprises yet.
but telling people to check their email when their mail server is offline probably doesn't work for them.
Boffoonery - downloadable Comedy Benefit for Bletchley Park
I just place blame on the user. And when they get defensive, I point out their defensiveness as proof of their guilt. Pretty soon, they learn not to complain.
THL phish sticks
I would not trust such a service. Just because a kernel can be upgraded in place doesn't necessarily guarantee that same kernel configuration will be able to boot your system in an outage. Something like a messed up GRUB configuration won't be spotted until you actually try to restart your system. I think part of a regular maintenance strategy is being able to restart your servers and make sure everything is configured to come back up automatically. The last thing you want to is to be trying to figure out what's wrong with your boot config when you have an unplanned outage.
Depends. Most places that require high availability have redundancy built in to the point where half of their servers can go offline and nobody (except server admins) even knows about it. But for small and mid-sized businesses that don't have those resources available, any time offline is lost work/sales/time/etc.
NetWare was bulletproof, for what it did.
http://www.novell.com/coolsolutions/netware/img/bartb_uptime.gif
http://www.theregister.co.uk/2001/04/12/missing_novell_server_discovered_after/
Some organizations who have operational requirements to provide a service continuously. For them there is no acceptable downtime.
And they've designed their systems properly such that not only the planned - but also unplanned - outage of a single server is both non-disruptive, and transparent.
"Service" and "server" are not synonymous. This is especially true once you move outside of trivial environments. If your HA service can't sustain the outage of an individual server, then its *fundamental architecture* is broken, and what OS is running barely even counts as semantics.
In the ATC application I support the workstations are very important. They are used 100% of the time and unanticipated downtime is a critical problem.
Firstly, patching is in no way "unanticipated downtime".
Secondly, if your environment can't sustain workstations being unavailable *even on a schedule*, then it's not meeting the requirements it was supposedly designed for.
3.x Netware was pretty darn bulletproof, provided you didn't mind copying the Bindery stuff to every different server, and one had to use IPX or nothing.
There are three things from it that were notable:
1: If a user doesn't have access to something, it doesn't show up in a listing. No directories or files with "access denied" messages, just making them more curious.
2: The OS was simple and had very limited functionality. Want some feature? Buy a third party NLM. Netware 3.11 had next to no attack surface.
3: The console commands kept the riffraff out. No point and drool interface. To use it, you had to at know the basics of what you were doing.
The one thing I wish was passed on to modern operating systems was feature #1. Out of sight, out of mind. If a directory isn't shown, a user won't bother trying to get access to it, as opposed to something saying "permission denied".
First Microsoft is not very eager to sue anyone, second this is totally different mechanism, third Microsoft patent is an old technology - very old because it describes what we did in OS/360, OS/370 operating systems and applications a long, long time ago. Patching memory was (sometimes!) a daily routine for local systems programmer - updating live 24x7 production systems is/was fun but scary!
Anyhow - $4 is cheap when someone is doing the pre-work for you. Actually - the more modularized / structured Linux (Linux == kernel!) gets, the easier it is to support dynamic / online updates with no interruption. There are systems where you can do it already, even all(?) Unix systems allow you to change the whole object in flight if the application is written for it. Actually I designed a while ago one for Windows, load new object, kill the old and the new is automatically used for next call / request / whatever. Tandem Pathway is one very good example, Erlang as a language and a system supports it, systems with failover to another cpu / node have always supported it since Datasaab "non-stop" system from (I think?) early 70's (Cobol kernel!)
Now, giving the "skills" of current "systems programmers", I'm not sure that real time patching is a good idea? Right or wrong, today the "hard" skills, understanding operating systems, their interactions with hardware and applications, etc is very rare! Not a person problem but the documentation, the trust on products / manufacturers / providers, etc are killing the low level skills even the computers handle zeros and ones the same way as day one. And unfortunately the same problems on high level - miracle products will solve all the problems / providers and manufacturers know my problems better than my experienced employees - and I have a bridge to sell!
Then... why did you go with this particular vendor instead of one that meets your needs?