The Decline and Fall of System Administration

← Back to Stories (view on slashdot.org)

The Decline and Fall of System Administration

Posted by CmdrTaco on Wednesday March 2, 2011 @01:51AM from the hail-the-fallen-heros dept.

snydeq writes "Deep End's Paul Venezia questions whether server virtualization technologies are contributing to the decline of real server administration skills, as more and more sysadmins argue in favor of re-imaging as a solution to Unix server woes. 'This has always been the (many times undeserved) joke about clueless Windows admins: They have a small arsenal of possible fixes, and once they've exhausted the supply, they punt and rebuild the server from scratch rather than dig deeper. On the Unix side of the house, that concept has been met with derision since the dawn of time, but as Linux has moved into the mainstream — and the number of marginal Linux admins has grown — those ideas are suddenly somehow rational.'"

10 of 500 comments (clear)

Min score:

Reason:

Sort:

Sad but smart by Anrego · 2011-03-02 01:58 · Score: 4, Interesting

I’m not a system admin but I don’t see how this is a bad approach.
I see value in finding out what the problem is and why it happened.. if you just blindly re-image then the problem might pop up again at a less opportune time.
But if you know what the problem is... and you have an image of the server in a working state, or a documented procedure on how to set up the server in it’s intended configuration then why would anyone waste time trying to repair it.
I think you have this kind of problem in most jobs. New approaches that make more sense but require less skill (and imply less e-pene) are always hated by people who have already learnt how to do it “the hard way”.
I see this as a programmer all the time and have been a victim of it. I’ve seen a huge chunk of my chosen industry migrate from meat and potato problem solving to gluing libraries together and sprinkling in business logic.
I’ve been fortunate to land in a job where there’s still a lot of “from the ground up” work, but these jobs are getting scarcer as even the components that everyone uses are made from other components. And executable UML (or something of its ilk) is probably going to be the next thing to cut the legs off us.
1. Re:Sad but smart by TheRaven64 · 2011-03-02 02:24 · Score: 4, Interesting
  
  Add to that - no one (outside of the IT department) cares what the problem is, they care about the downtime. If you have some redundancy, stuff can fail periodically without the users noticing. An 'admin' capable of keeping it running can be someone paid to do something else who has responsibility for clicking the button every few months if required. An admin who can actually address the problem will cost, what, $60,000/year minimum (including associated costs, not just salary)? Is having ten minutes of downtime every few months costing your business $60,000/year? If not, then it's not worth the cost of doing it properly. It may be for a bigger company, but for a small business that would eat most of their profits. This is the advantage of a Windows or Mac server, with its pointy-clicky interface: it may be less reliable, and more expensive, but the cost saving from not needing to employ anyone who actually understands what's going on outweighs it. Especially if you buy a support contract, where the vendor will send someone competent out for the couple of time a year where something goes seriously wrong.
  
  --
  I am TheRaven on Soylent News
I can't tell you how many times I have heard this. by Noryungi · 2011-03-02 02:04 · Score: 5, Interesting

Many times, what I hear as "solutions" are simply variations on the theme: "Why can't we reboot the server?" or "Why can't we reinstall the server from scratch?".
And my answer usually was: "Listen, I don't care how many times you do this on a Windows machine, but this is UNIX - I'll only reboot this machine if I absolutely need to. In the meantime, watch and learn as I kill the offending processes. Oh, and re-installing the machine means 24h of downtime".
These days, I help run a (very) large application, which runs on top of a (very) large "enterprise" SQL database for a (very) large company. The only problem is: enterprise application does not manage database very well, and leaves zombie processes on the database server. After a while, the database server just crashes (hard) and takes down the application server with it. Logical solution (and the one recommended by sysadmins): upgrade application to version X, which is supposed to have a much better database management.
What do you think the PHB/management solution is? Ask the DBAs to write a script that will monitor zombie processes, so the sysadmins will be warned in advance... Like, around 20 minutes before the application crashes. Just enough time to tell all users to save their work, because we need to reboot everything. Just like under Windows.
Did I mention the application is considered mission-critical and runs 24x7? And that downtime can cost more than 6 figures to said (nameless) company?
And, since you asked, yes, I am looking for another job. (Clueless admins and pointy-haired bosses: a match made in...)

--
The right to offend is far more important than the right not to be offended. (Rowan Atkinson)
Re:From personal experience by VolciMaster · 2011-03-02 02:06 · Score: 1, Interesting

"they punt and rebuild the server from scratch rather than dig deeper."
From personal experience this is normally due to management jumping down our throats to simply "get it done" which unfortunately runs counter to our inquisitive desires to actually solve the problem.
I suspect it's the end result of pressure to get more bang for their bucks in a tight economy, but that's pure speculation. It really could be a trend of the times.
Having witnessed this type of behavior across myriad companies and industries, I can say the rebuild/clone/redeploy approach is used NOT because of "pressure to get more bang for their bucks" - it's that it is inherently easier to do this approach than to deep-dive perhaps for days to find The Answer(tm). In an environment of thousands of servers (or even dozens), deep-diving into a problem [generally] is a waste of time. While it is interesting intellectually, there is no other benefit.

--
antipaucity
Virtualization != marginalization of skills... by Shoeler · 2011-03-02 02:14 · Score: 4, Interesting

This seems to me to be a philosophical question. Indeed, if the uptime and more importantly availability is higher by the purported crash and burn (taking liberties with the slash and burn deforestation technique) method, who is to say it is less useful or less valid? Indeed, to espouse skills over delivering for the client seems to be missing the point. It seems to be standing on some pedagogical imperative that knowledge is somehow of more value in the workplace than delivery.

Now - having said that - don't get me wrong. I have seen entirely too many *nix sysadmins (full disclosure: I got an RHCE in 2003) who don't know where the network config files are because they only know the GUI, and are hired by a team of people who have never logged into a *nix box. However, I think the ill that is most egregious is not that it sets some moral and ethical imperative fo fixing rather than reloading (or in this case, recovering from a VM image) a server, but the fact that it misses the point that there has been a dearth of qualified IT candidates since the dawn of our industry and that the fixes to this don't have to do with how we fix a server, but how we hire and more importantly who we hire. As is everything in IT, garbage in == garbage out.

Finally - I absolutely agree with the Infoworld argument. It assumes an unexpected failure within the server, not some external thing that needs to be diagnosed and fixed. If your app crashes because the SQL table isn't there on the SQL server you don't control, rebooting ain't going to do a hill of beans worth of good.
Re:Hyperviser by Anonymous Coward · 2011-03-02 02:31 · Score: 4, Interesting

Because pointing and clicking inherently takes more skill than using CLI, right? Never mind that most CLI commands will readily assist you with syntax if your format incorrectly, whereas documentation for a GUI, if it exists at all, is often useless..,
Re:It will just get worse (depending on your view) by vlm · 2011-03-02 02:44 · Score: 3, Interesting

As VM's are virtualized and taking snapshots of them becomes so easy, why would you bother troubleshooting anything when you can just restore to a snap that is an hour old?
The security exploit that cracked the old image in less than a second, will crack the "identical" new image in less than a second. Or data sample #1213 which overflowed the buffer and crashed image A will simply overload and crash image B.
What it really brings up is a class distinction in sysadmins. Theres the guy whom actually fix systems, like patching security holes in system libraries to work around app bugs, redesigning firewall ACLs to avoid a new threat, do scalability assessments before the overload crashes something, and there are the guys that fix individual things like motherboards and hard drives, not administer systems, basically help desk people with the fancy sysadmin job title. Virtualization means the helpdesk board swappers with the cool job titles are outta here, but the real sysadmins have little if anything to fear.

--
"Science flies us to the moon. Religion flies us into buildings." - Victor Stenger
Re:Hyperviser by ron_ivi · 2011-03-02 04:39 · Score: 3, Interesting

> a clean sweep and restore is perfectly acceptable and reasonable
NNNOOOOOOO!
Often a glitch like that is the only evidence you'll have that a machine had been compromised or that hardware is failing.
If you must do a clean sweep, do that on a standby system, and keep an image of the failed one until you can investigate the exact reason for the failure.
Re:Hyperviser by jc42 · 2011-03-02 05:23 · Score: 5, Interesting

No, the difference is that a CLI is nearly impossible to use if you aren't familiar with it - the semantics and syntax are as, if not more, important than the concepts - whereas a GUI requires much less focussing on the "how", allowing much more focussing on the "what".
While there's a certain truth to this, GUIs are in general a lot less "intuitive" than people tend to believe. Without documentation and training, most users are unaware of most of their GUI's capabilities, and have great difficulty in learning much more than the basics.
An example I've read a number of warnings about in web-design documents is that a significant number (often estimated at around 50%) of "non-geek" users don't understand scroll bars. This is usually mentioned along with the advice to put the important part of your web pages close to the top, because the non-scrolling users won't be able to see anything below that.
Yes, I was dubious when I first read this. But over the years, I've run into several clear examples. I've been involved in building web sites for some very non-geeky organizations. The orgs' leaders generally want a lot of stuff on their main page, and at the top they usually want some text about the organization, its purposes, its main activities, etc. They also agree that it's good to have a list of upcoming public events on the main page, and inevitably that's positioned below the introductory text, so it's often not visible unless the user has a rather large window.
In each case, there were eventually meetings with discussions of how to improve the web site. One thing that would come up was suggestions from users (including members) that the home page should have a list of upcoming events. The leaders have always been dumfounded by this. "But, but, ... There is such a list on the home page." "What?? No, there isn't."
Eventually, I have to interrupt, and explain to the org's leaders that they're hearing from people who don't understand scrollbars, have never seen the events table because they don't scroll down to see it. The users are, of course, confused; they know that there's no such table because they've never seen it. We bring up the site on a handy machine (preferably a laptop or tablet with a small screen), and I show the users that it's there by scrolling down to it. Their response again is confusion, because they don't know what I did or how I did it. "Why's it hidden like that?"
So I teach them about scrollbars, and a few users have learned something useful. But this has a more important effect: It gets across to the leaders why their design was wrong, as I'd been telling them, and they'll have a better web site if they'll let me fix it.
One instance of this happened just last week. The org's web site now has that block of extensive history and purpose in a separate box at the bottom of the page, and the table of coming events is positioned near the top, just below the logo bar, where non-geek users will see it and be able to read at least the first few entries.
Examples like this abound in GUI design. Many of the common widgets are not at all intuitive to most people. Even if they accidentally poke at things and trigger the actions, it's often difficult to grasp what the effect was. You see things change, but the changes don't make sense, and have no obvious relation to the icon that you clicked on. Often the icons don't look like anything that most users can name. The result is that most of the GUI is unusable to most of the users.
I wish I knew good ways around this. But truly making a GUI obvious is very difficult, and takes a lot of time studying the users and learning about their misconceptions. I very rarely have the time to do this, and in many cases the people paying me have expressly forbid wasting time with dumb users.
And that's something that's very difficult to program around. ;-)

--
Those who do study history are doomed to stand helplessly by while everyone else repeats it.
Re:Hyperviser by drsmithy · 2011-03-02 06:27 · Score: 3, Interesting

While there's a certain truth to this, GUIs are in general a lot less "intuitive" than people tend to believe. Without documentation and training, most users are unaware of most of their GUI's capabilities, and have great difficulty in learning much more than the basics.

Sure, but the point is with a CLI and no understanding of its syntax and semantics, you're pretty much dead in the water from the get-go. You could have a deep understanding of networking, but if you're unfamiliar with the syntax of iptables, you're not going to be able to configure a Linux firewall.
Your scrollbar example is actually a good one, because it highlights the key differences between a GUI and a CLI. In a GUI, there is both a visual indicator that the content is larger than a single page, positive feedback from the UI element if the user tries to interact with it (ie: it reacts to a mouse click), and secondary feedback that the UI element is important even if it is triggered "accidentally" (ie: it moves if the user presses page down, space, or in some other way makes the page scroll).
In a CLI, you would simply be presented with a single page of text. Advancing to the next page would require knowing which key(s) to press to do so. If you don't know the key, you're screwed. Some CLIs may present a "press space to continue", or similar, message, but that's starting to blur the line between CLI and GUI, IMHO.
Further, the new knowledge those users have about the scrollbar is now applicable to pretty much any GUI they use in the future, even ones running on completely different OSes (I recognise this doesn't apply to all UI elements, but the fundamentals - buttons, menus, scrollbars, selection boxes, etc - are pretty consistently implemented in similar ways across the board). The knowledge they have gained about the CLI interaction is probably specific to that CLI only (how many different ways in different CLIs do you know of to trigger a page down ?).

Examples like this abound in GUI design. Many of the common widgets are not at all intuitive to most people. Even if they accidentally poke at things and trigger the actions, it's often difficult to grasp what the effect was. You see things change, but the changes don't make sense, and have no obvious relation to the icon that you clicked on. Often the icons don't look like anything that most users can name. The result is that most of the GUI is unusable to most of the users.

Sure, but the point is that there *ARE* things there to "poke at" and there is feedback that something actually happened. A CLI has neither - you need to know the commands in advance to do anything, and often the only feedback from a command is to indicate an error (and frequently said feedback is not useful at all in understanding what the error was).
Human cognition is highly depend on visualisation, context and feedback. A CLI interface lacks - or typically has very minimal implementations of - all of those.