10 Dos and Don'ts To Make Sysadmins' Lives Easier
CowboyRobot writes "Tom Limoncelli has a piece in 'Queue' summarizing the Computer-Human Interaction for Management of Information Technology's list of how to make software that is easy to install, maintain, and upgrade. FTA: '#2. DON'T make the administrative interface a GUI. System administrators need a command-line tool for constructing repeatable processes. Procedures are best documented by providing commands that we can copy and paste from the procedure document to the command line.'"
1. DO switch every don't to a do and do to a don't on that list. You are now a user.
10 is an even number. There's no duplicates. None of them are filler.
I don't understand how this happened.
Did someone plan this before they wrote it? What gives?
slashdot: where everyone yells sarcastic metaphors to themselves to understand the issue
It's a top-10 list that actually has insightful information on how to do software right, instead of being a random collection of ten things to make a fluff article. Bonus points for being things that I actually agree with.
"sysadmins' lives" is correct. It is referring to the lives of sysadmins.
Unless, of course, you are referring to the sexual practices of punctuation marks. Then, I don't know.
The article author is also behind The Practice of System and Network Administration, truly an excellent text into the practicalities of work in IT.
If you want to make a sysadmin's life easier (as if any programmer ever wants to do that), you can start by making your error and status messages 1.) plentiful and 2.) easy to understand. Also, provide several logging levels so we can drill down as needed, and make sure the logging levels are meaningful. Too many programmers put just two log levels: one which shows nothing useful, and another that spews out indecipherable hex dumps of every call it makes.
Face up to the fact that no matter how awesome your software is, it's going to fail. Not only that, but it's going to fail in ways you never thought possible at the worst possible times. Make sure we have enough information to figure out what happened. Otherwise, stuff like this happens:
Program: *crash for no apparent reason*
Sysadmin: Why did you crash?
Program: Because something went wrong.
Sysadmin: What went wrong?
Program: Something.
Sysadmin: I need more detail. Increasing log level.
Program: Something bad went wrong.
Sysadmin: I need more than that. Increasing log level again.
Program: Fuck you. Here's a 16GB hex dump of system memory. Figure it out yourself jackass.
Sysadmin: *picks up a crowbar and goes off to find the programmer*
In essence, all 10 items on the list say "Use Linux!"
Yeah, ok, thank you Captain Obvious, I mean CHIMIT :P
Not really. The same problems exist in Linux -- authentication, logging, putting files in random folders (/var, /etc).
Don't make me use a real browser to click all the way through your site, make me agree to a stupid set of conditions for using the software, and then provide my browser with a cookie that it can subsequently use to download your software; when my browser is on one continent and the machine that wants the software is on another continent; you ass-fucks...
10 is an even number. There's no duplicates. None of them are filler.
I don't understand how this happened.
I know how they came up with a high-quality top ten: They had 13 or so, and they cut the weakest ones.
> DO have a configuration file that is an ASCII file, not a binary blob.
And by ASCII we mean something that can be edited by any editor.
XML is the equivalent of a binary blob when you are up to your ass in alligators trying to get things working again with minimal tools available.
2. DON'T make the administrative interface a GUI.
Amen, the number of times I have dumped on products because of the lack of a CLI is almost rude and funnily enough it saves a lot in licensing costs so "almost" everyone is happy. Pretty pictures and buttons will get you past the management and sales but if you come near my systems with your "button pushing monkey" toys expect your time in the building to be very short indeed.
...if the GUI is well done and complements command line.. Some tasks actually ARE much better performed with Point&Click.
One example of a "good" GUI that I use a lot is the ASDM for Cisco ASA firewalls. Most of the simpler admin tasks are in fact *faster* via ASDM. If you have your network objects all properly set up and you need to add a firewall rule, it's far simpler to select it from a list (actually, in this case it's a combobox - just type first few letters to filter your choices and then click) than typing that stuff in manually. Packet tracer to check the rules is much nicer to use via the GUI. Setting up VPN profiles is simpler via ASDM. Handling network object groupings is simpler via ASDM.
Editing access-lists, doing routing configuration and most of the more "rudimentary" tasks are still something I do via command line, though.
I thought they just followed Jesus around.......
BM3
if it fails in a way that you never thought possible, how would you write an error message that describes the failure?
8. [...] Similarly, use the operating system's built-in authentication system and standard I/O systems.
This can be a bad thing if your application runs on a platform whose built-in authentication is a nickel-and-dime revenue stream for the platform's publisher. Microsoft Windows Server is like this: each user account on the built-in authentication system requires a Client Access License.
Feel free to make a GUI for the administrative interface, but not at the expense of an underlying CLI.
There are two ways to do this: have your GUI call the CLI when necessary, or use a common API behind both. Other methods will lead to bitrot in one of the interfaces, most likely the CLI.
GUIs are fine and even enjoyable to a certain extent, but the author is right that the CLI takes priority.
I manage almost exclusively Linux servers and i must say the command line saves me ooodles of time. Some quirks can be alleviated by just restarting some services before they run out of memory, some needs a bit more magic but nothing takes time like having to login to many computers every day and click on the same friggin GUI stuff on multiple servers.
Bash saves me time by totally taking repetitive tasks away. Ive tried the same with some Windows machines but while powershell has potential, it does not work in reality unless you are a 100% Microsoft shop, and you happen to run the limited set of applications that has full support for powershell.
Maybe in time Windows will climb up to the level of Linux when it comes to manageability but right now i spend most of my time doing repetitive stuff on my Windows boxes while i write scripts that handles anything on the Linux boxes.
HTTP/1.1 400
No, I'm sorry, it is not correct. Sysadmins don't have lives.
10 is an even number. There's no duplicates. None of them are filler. I don't understand how this happened. Did someone plan this before they wrote it? What gives?
Its an acm.org article. Not only did the author probably plan, re-read and revise the article before submitting it but a technically knowledgable editor probably read it and may have offered useful and insightful suggestions. Now there may not have been a formal peer review process but the editor may have also had one or more experts in the field read it and offer comments and suggestions.
;-)
Yes the above seems an archaic process but consider that the acm is full of old people who had experience publishing back when things were done with dead trees.
This reads like a specification for building a unix system.
Those who don't understand Unix are doomed to reinvent it... or something like that.
Alex, I'll take keybindings not used by Emacs for $400....
In reference to point 8, this is something I wrote I while ago after dealing with several Windows apps that either horribly abused the Eventlog or refused to use it entirely:
Do not assume that your software is running with elevated access... (root/administrator)
Make sure its clear whether you meant '10' in base 2, 8, or 16.
Have gnu, will travel.
A GUI is NOT fine for administering a broken system over a slow link to the other side of the world.
I used to remotely administer a set of servers in the middle east. The bandwidth was tiny, and the latency was insane. I would type a command out, then take a sip of coffee while waiting to see it displayed before hitting "enter." I had to use a GUI for one application, and it took over 40 minutes to fire up and display on my machine.
Mandatory (and well-designed) GUIs should be for using an application, not administering or installing it.
"People who do stupid things with hazardous materials often die." -- Jim Davidson on alt.folklore.urban
#10 should probably be #1. Support and documentation is everything. Because when it hits the fan, finding the original install CDs or manual is almost always a requirement. It's also why I stopped buying Nvidia cards. They got rid of almost all of their patches and drivers as well as installation CDs on their site and now force you to use their "all in one" tool. And lo and behold, you're screwed 90% of the time with an older machine if you don't have the original install CD because it simply doesn't work without the CD.
Case in point - I tried to recover an old machine's crashed system(video drivers and dirext X had eaten themselves when "upgrading" as is typical) - but the online driver was useless. The original CD was the only option, but it wasn't to be found. (as is typical, few customers keep driver CDs where they can find them). The manufacturer didn't have the original CD to download, either.(honestly, a 50mb ISO file isn't going to kill their server space) I had to buy a new card to solve what should have been a ten minute problem. Nobody was happy about it, either, as you would imagine, since the card wasn't even two years old at the time.
(note - a "roll back" option also needs to be available when "upgrading") I'd wager that 95% of the time it is simply not there.
Instead, why not try using, oh, I dunno, "tar" and "make" and friends -- you know, the standard 'nix tools that every system administrator has been working with quite happily for decades and which suffice nicely to install tens of thousands of software packages ranging from the dirt-simple to the incredibly complex.
I'm looking at you, SAS.
1. DO have a "silent install" option.
Silent install is nice, but so is an intelligent install, or a well thought-out, correctable upgrade process.
These systems do it well:
Debian and RedHat derived; Windows, post-2003. OS install is still a bit of a bitch with Windows. The upgrade process for MediaWiki is also stupid easy and effective (basically: untar new tree and run db alter scripts).
Poorly:
FreeBSD, and, really, most BSDs, are horrible for upgrading. I suspect OS X is similarly stupid when it comes to "promptless installs". Cacti, likewise, is awful.
2. DON'T make the administrative interface a GUI.
A useful amendment to this is: don't make the administrative interface shitty. GUI is fine, as long as I can leverage it progmatically. CLI tool is great, as long as it's fucking documented and not obtuse.
Case in point (in opposition): MegaCLI, for MegaRAID cards. Absolute. Shit.
3. DO create an API so that the system can be remotely administered.
An API is great, and allows for programmers to dig in and extend the product. I'm thinking of VMWare, XenServer, and Virtualbox right now. The latest Windows versions with PowerShell and the management consoles are not a bad combination of usability/power/utility.
Most sysadmins don't have the time to dig into the API, though, so a good initial tool that isn't terribly dense or limited in functionality is a must (XenServer, please improve your shitty-useless UI on xsconsole and XenCenter; I'd like a little more access to my VM disks without digging into lv/pv commands, too).
4. DO have a configuration file that is an ASCII file, not a binary blob.
No argument here. Likewise, configuration should be human-readable and not have vague incantations.
Good: samba, and all tools which use similar configuration syntax.
Bad: sendmail is the worst offender I can think of at the moment. I'm sure all the djb* stuff, too.
5. DO include a clearly defined method to restore all user data, a single user's data, and individual items (for example, one e-mail message). The method to make backups is a prerequisite, obviously, but we care primarily about the restore procedures.
Good: any UNIX system and it's $HOME; modern Unix MTAs like Courier.
Bad: Cyrus IMAP. Pretty much any tape archive system comes close to frustrating as hell. Windows still has a long way to improve until it's capable of Unix-style $HOME utility.
6. DO instrument the system so that we can monitor more than just, "Is it up or down?"
WMI is great. SNMP on Unix/Linux hosts, not so much, due to the configuration and divergence involved. Most OEM Linux/Unix based machines or systems (XenServer) are relatively shitty in this regard, too.
7. DO tell us about security issues.
Telling us about them is great, but upgrading these things are the most important, time-sensitive upgrades we need to make, so they should also be the easiest. We should not have to break two-three different things just to get the upgrade done.
BSDs are bad about this; horrible, even. The time consumed by a simple upgrade is enormous.
Linux is mediocre, but better than most.
Windows, in this case, "just works". Except when it doesn't (though I'd argue the degree is no greater than, say, the Linux upgrade process). Your biggest cost will be when it installs something you've explicitly told it not to (*cough* new IE versions) or in bandwidth and/or uptime requirements.
8. DO use the built-in system logging mechanism (Unix syslog or Windows Event Logs).
Something which doesn't do this isn't even worth looking at. It's yet one more thing to manage and uses exponential
Addition: make your logging sensible, please. I don't want to see a full trace of everything in the logs and not be able to configura
~/ssh slashdot.org ssh: connect to host slashdot.org port 22: too many beers
2. DON'T make the administrative interface a GUI. System administrators need a command-line tool for constructing repeatable processes. Procedures are best documented by providing commands that we can copy and paste from the procedure document to the command line. We cannot achieve the same repeatability when the instructions are: "Checkmark the 3rd and 5th options, but not the 2nd option, then click OK." Sysadmins do not want a GUI that requires 25 clicks for each new user. We want to craft the commands to be executed in a text editor or generate them via Perl, Python, or PowerShell.
Since I've had to work with Windows servers in my new job, I thought I'd better read up on them, so I've been reading Windows Server 2008: The Definitive Guide. The sections on the underlying principles and theory of the OS are fine. But that's one third of the text, at most. Most of the text is useless blow-by-blow accounts of sequences of clicks in GUI applets. It's completely unreadable -- the descriptions are meaningless unless you're working through the instructions with an instance of Windows Server 2008 in front of you. And who's going to set up several instances, just to make sense of the description of the applet for configuring load balancing?
I can't blame the book, particularly, as it's a problem of GUIs.My workplace has lots of documents with step-by-step instructions for configuring services, which have one sentence of text, followed by a screenshot, followed by another sentence of text, and another screenshot, and so on.
On the flip side, one of the great things about text configuration files is that while they're full of obscure configuration options, they're also full of the documentation explaining the obscure configuration options. Config files are rich with documentation. GUI configuration applets frequently aren't. I'll take a documented option in a config file over an undocumented option in a GUI config applet any day.