10 Dos and Don'ts To Make Sysadmins' Lives Easier
CowboyRobot writes "Tom Limoncelli has a piece in 'Queue' summarizing the Computer-Human Interaction for Management of Information Technology's list of how to make software that is easy to install, maintain, and upgrade. FTA: '#2. DON'T make the administrative interface a GUI. System administrators need a command-line tool for constructing repeatable processes. Procedures are best documented by providing commands that we can copy and paste from the procedure document to the command line.'"
10 is an even number. There's no duplicates. None of them are filler.
I don't understand how this happened.
Did someone plan this before they wrote it? What gives?
slashdot: where everyone yells sarcastic metaphors to themselves to understand the issue
It's a top-10 list that actually has insightful information on how to do software right, instead of being a random collection of ten things to make a fluff article. Bonus points for being things that I actually agree with.
"sysadmins' lives" is correct. It is referring to the lives of sysadmins.
Unless, of course, you are referring to the sexual practices of punctuation marks. Then, I don't know.
The article author is also behind The Practice of System and Network Administration, truly an excellent text into the practicalities of work in IT.
If you want to make a sysadmin's life easier (as if any programmer ever wants to do that), you can start by making your error and status messages 1.) plentiful and 2.) easy to understand. Also, provide several logging levels so we can drill down as needed, and make sure the logging levels are meaningful. Too many programmers put just two log levels: one which shows nothing useful, and another that spews out indecipherable hex dumps of every call it makes.
Face up to the fact that no matter how awesome your software is, it's going to fail. Not only that, but it's going to fail in ways you never thought possible at the worst possible times. Make sure we have enough information to figure out what happened. Otherwise, stuff like this happens:
Program: *crash for no apparent reason*
Sysadmin: Why did you crash?
Program: Because something went wrong.
Sysadmin: What went wrong?
Program: Something.
Sysadmin: I need more detail. Increasing log level.
Program: Something bad went wrong.
Sysadmin: I need more than that. Increasing log level again.
Program: Fuck you. Here's a 16GB hex dump of system memory. Figure it out yourself jackass.
Sysadmin: *picks up a crowbar and goes off to find the programmer*
Don't make me use a real browser to click all the way through your site, make me agree to a stupid set of conditions for using the software, and then provide my browser with a cookie that it can subsequently use to download your software; when my browser is on one continent and the machine that wants the software is on another continent; you ass-fucks...
> DO have a configuration file that is an ASCII file, not a binary blob.
And by ASCII we mean something that can be edited by any editor.
XML is the equivalent of a binary blob when you are up to your ass in alligators trying to get things working again with minimal tools available.
...if the GUI is well done and complements command line.. Some tasks actually ARE much better performed with Point&Click.
One example of a "good" GUI that I use a lot is the ASDM for Cisco ASA firewalls. Most of the simpler admin tasks are in fact *faster* via ASDM. If you have your network objects all properly set up and you need to add a firewall rule, it's far simpler to select it from a list (actually, in this case it's a combobox - just type first few letters to filter your choices and then click) than typing that stuff in manually. Packet tracer to check the rules is much nicer to use via the GUI. Setting up VPN profiles is simpler via ASDM. Handling network object groupings is simpler via ASDM.
Editing access-lists, doing routing configuration and most of the more "rudimentary" tasks are still something I do via command line, though.
I thought they just followed Jesus around.......
BM3
8. [...] Similarly, use the operating system's built-in authentication system and standard I/O systems.
This can be a bad thing if your application runs on a platform whose built-in authentication is a nickel-and-dime revenue stream for the platform's publisher. Microsoft Windows Server is like this: each user account on the built-in authentication system requires a Client Access License.
Feel free to make a GUI for the administrative interface, but not at the expense of an underlying CLI.
There are two ways to do this: have your GUI call the CLI when necessary, or use a common API behind both. Other methods will lead to bitrot in one of the interfaces, most likely the CLI.
GUIs are fine and even enjoyable to a certain extent, but the author is right that the CLI takes priority.
GUIs are (sometimes) better when you want to do something *once*.
They really suck when you have to do that same thing hundreds of times. Which sysadmins do. On a regular basis.
Which is a perfect example of a terrible error message. And there's plenty of bad examples like that to crib from, too. (In your particular example, sure, you'll have the "at line XXX" so someone can start digging around in the code... but that's something only suitable for quick-and-dirty hack scripts.)
What you need to know is WHAT, WHERE and HOW. You know WHO (the program), and are trying to figure out WHY. I've often had to resort to strace -etrace=file to find out "What file couldn't be opened? Why couldn't it be opened?"
So, sticking with perl:
open FILE,"filename.txt" or die "Cannot open \"filename.txt\" for reading--$!\n";
Your example will give only the errno, which is what I'm calling HOW [it went wrong]. WHAT went wrong is the "open for reading". WHERE it went wrong is "filename.txt".
I generally wrap such calls with a library; that way, I don't have the error handling littering up every call-site. But if you're using an exception-oriented language, we need the SAME INFORMATION once it turns into an error message!
Oh yeah: For error recovery code, files can't be opened for more reasons than just, "It's not there." You can try all you want, but if (say) the filesystem has gone read-only due to a disk controller failure resulting in journal abort, you might want to do something different. That one's strictly hypothetical, haven't had it happen in over a week--ever since I replaced that faulty cable....
No, I'm sorry, it is not correct. Sysadmins don't have lives.
No.. the users are the ones who can't figure out how to use the system, that's why there's an admin.. if users knew what the fuck they were doing, we wouldn't NEED sysadmins in the first place.
If the system was designed properly for the userbase, so that users could use the system, you'd still need sysadmins to administer the system, which is notionally what sysadmins are for (hence the name.)
You wouldn't need sysadmins to take breaks from administering the system to handhold users through basic usage tasks, but then, that's not really the point of a system adminstrator in the first place.
10 is an even number. There's no duplicates. None of them are filler. I don't understand how this happened. Did someone plan this before they wrote it? What gives?
Its an acm.org article. Not only did the author probably plan, re-read and revise the article before submitting it but a technically knowledgable editor probably read it and may have offered useful and insightful suggestions. Now there may not have been a formal peer review process but the editor may have also had one or more experts in the field read it and offer comments and suggestions.
;-)
Yes the above seems an archaic process but consider that the acm is full of old people who had experience publishing back when things were done with dead trees.
In reference to point 8, this is something I wrote I while ago after dealing with several Windows apps that either horribly abused the Eventlog or refused to use it entirely:
Do not assume that your software is running with elevated access... (root/administrator)
A GUI is NOT fine for administering a broken system over a slow link to the other side of the world.
I used to remotely administer a set of servers in the middle east. The bandwidth was tiny, and the latency was insane. I would type a command out, then take a sip of coffee while waiting to see it displayed before hitting "enter." I had to use a GUI for one application, and it took over 40 minutes to fire up and display on my machine.
Mandatory (and well-designed) GUIs should be for using an application, not administering or installing it.
"People who do stupid things with hazardous materials often die." -- Jim Davidson on alt.folklore.urban
I wonder if there are forums on the Web where plumbers shit all over eachother.
Bullish Machine Tzar
1. DO have a "silent install" option.
Silent install is nice, but so is an intelligent install, or a well thought-out, correctable upgrade process.
These systems do it well:
Debian and RedHat derived; Windows, post-2003. OS install is still a bit of a bitch with Windows. The upgrade process for MediaWiki is also stupid easy and effective (basically: untar new tree and run db alter scripts).
Poorly:
FreeBSD, and, really, most BSDs, are horrible for upgrading. I suspect OS X is similarly stupid when it comes to "promptless installs". Cacti, likewise, is awful.
2. DON'T make the administrative interface a GUI.
A useful amendment to this is: don't make the administrative interface shitty. GUI is fine, as long as I can leverage it progmatically. CLI tool is great, as long as it's fucking documented and not obtuse.
Case in point (in opposition): MegaCLI, for MegaRAID cards. Absolute. Shit.
3. DO create an API so that the system can be remotely administered.
An API is great, and allows for programmers to dig in and extend the product. I'm thinking of VMWare, XenServer, and Virtualbox right now. The latest Windows versions with PowerShell and the management consoles are not a bad combination of usability/power/utility.
Most sysadmins don't have the time to dig into the API, though, so a good initial tool that isn't terribly dense or limited in functionality is a must (XenServer, please improve your shitty-useless UI on xsconsole and XenCenter; I'd like a little more access to my VM disks without digging into lv/pv commands, too).
4. DO have a configuration file that is an ASCII file, not a binary blob.
No argument here. Likewise, configuration should be human-readable and not have vague incantations.
Good: samba, and all tools which use similar configuration syntax.
Bad: sendmail is the worst offender I can think of at the moment. I'm sure all the djb* stuff, too.
5. DO include a clearly defined method to restore all user data, a single user's data, and individual items (for example, one e-mail message). The method to make backups is a prerequisite, obviously, but we care primarily about the restore procedures.
Good: any UNIX system and it's $HOME; modern Unix MTAs like Courier.
Bad: Cyrus IMAP. Pretty much any tape archive system comes close to frustrating as hell. Windows still has a long way to improve until it's capable of Unix-style $HOME utility.
6. DO instrument the system so that we can monitor more than just, "Is it up or down?"
WMI is great. SNMP on Unix/Linux hosts, not so much, due to the configuration and divergence involved. Most OEM Linux/Unix based machines or systems (XenServer) are relatively shitty in this regard, too.
7. DO tell us about security issues.
Telling us about them is great, but upgrading these things are the most important, time-sensitive upgrades we need to make, so they should also be the easiest. We should not have to break two-three different things just to get the upgrade done.
BSDs are bad about this; horrible, even. The time consumed by a simple upgrade is enormous.
Linux is mediocre, but better than most.
Windows, in this case, "just works". Except when it doesn't (though I'd argue the degree is no greater than, say, the Linux upgrade process). Your biggest cost will be when it installs something you've explicitly told it not to (*cough* new IE versions) or in bandwidth and/or uptime requirements.
8. DO use the built-in system logging mechanism (Unix syslog or Windows Event Logs).
Something which doesn't do this isn't even worth looking at. It's yet one more thing to manage and uses exponential
Addition: make your logging sensible, please. I don't want to see a full trace of everything in the logs and not be able to configura
~/ssh slashdot.org ssh: connect to host slashdot.org port 22: too many beers
I disagree,
take any person of reasonable intelligence and place them in an unfamiliar settting. They become retarded.
The fact that they have been in front of that unfamiliar device for 20 years means they just don't care.
Give me a user who cares to familiarize them-self with the system and 6 months, I'll give you a half decent sysadmin. At least better than half of the paper certified MCSE's I've had the pleasure to work with.
Linux Zealots: Smarter than Mac Zealots, but still zealots.