Top 10 System Administrator Truths
Vo0k writes "What are your top ten system administrator truths? We all know them already, but it's still fun re-telling them. Stuff like "90% of all hardware-related problems come from loose connectors", even though you already know it's true, may save you from replacing the "faulty" motherboard if you recall it at the right time."
It's interesting that everybody seems to know these things, and yet they still get us. A couple months back, I went through three power supplies before I discovered the fact that I actually had a power cable that was going bad. You don't even think of things like how power cables can go bad, but they do.
Very well; let this abomination unto the Lord begin!
... even though it's better than it used to be, registry corruption is still the number one cause of boot failures in Windows XP. And the contents ntbtlog.txt and the Recovery Console are still horribly inadequate tools for fixing it...
Athletic Scholarships to universities make as much sense as academic scholarships to sports teams.
1) Never believe anything anyone tells you: always test for yourself.
2) Always ask the dumb questions: is it switched on?
3) Reboot cures most things EXCEPT rm -r * when logged in as root
After that, things could get tricky.
-You shall be very pessimistic
-Make sure you can leave exactly like it was before you touched it.
-Dont fix what aint broken.
-Start from a known state of the system (switch off - switch on).
-Even you are genius level techie, follow the manual, RTFM.
-Dont reinvent the wheel. Compare with something thats working.
-Cables are not perfect. If something doesnt connect, check lower levels first.
-If its there, ther must be a reason. Never ever delete anything. Rename instead.
-You memory is not infinite. Write what you do.
Google is your best freind. ever. period.
This goes for admins, programmers, and just about every other profession, especially in IT.
Good managers ask for something in 5 days, but need it in 6.
Such a basic thing, but so so important. I always try to pad estimates for our department, but I should be sure to pad my requirements for my staff as well.
Excuse my speling.
Making The Bar Project
Maybe for a PC, but never a server.
:-P
When I started working at my job, we had serveral servers that would reboot on a cron for the sole reason that someone was too lazy to figure out the problem. We eliminated all but one of these reboots, mainly because we don't care about the last one.
My holy grail would have to be strace/truss/tusk. I would take that tool over reboot any day. It doesn't always fix the problem, but at least you will know what it is, instead of rebooting like a coward.
v4sw6PU$hw6ln6pr4F$ck 4/6$ma3+6u7LNS$w2m4l7U$i2e4+7en6a2X h
In my Tech Support experience, I have found only three basic rules.
Rule number 1. People are stupid. This one is true of all people. Tech support, highways, shopping, whatever. This rule has been extended to cover just about any stupid thing that anyone does.
"Why did that guy just..."
"Rule number 1."
"Did she think she could get away with that?"
"Rule number 1."
Rule number 2. People lie.
Me: "Has the computer been restarted since the problem started?"
Them: "Yes..."
Me: "OK. Let's try restarting the computer now and see what happens."
Them: "What do you mean by restart?"
And when you add 1 and 2 together, you get 3. Sometimes, people are so stupid, they don't know that they're lying. You know these people. They're the ones who have "Windows 2000 XP" or "2000 ME." They're the people for whom "Nothing happens when I try to check my email. Nothing! Just this error message..." Not realizing that the error message is *exactly* what I was looking for. An error message is *not* nothing. Grr.
There is a fourth rule that also shows up from time to time:
Rule number 4. No good deed goes unpunished.
In the famous words of the leader of the Uruk Hai from his battle call at Helm's Deep in The Two Towers: "Grr."
You forgot the part about where they have to write the password down and stick it to their monitor with a post it note.
It would be really interesting to see a study to determine whether changing passwords frequently actually increases or decreases your vulnerability.
One of the most frustrating things is users who do what you ask, and then promptly do a bunch of things immediately afterwards that you don't ask. You try going step-by-step with them, and meanwhile they are opening menus and clicking away at things they don't understand, because somehow hearing your voice tell them what to do gives them all the control of a runaway horse.
I know, those are all corrolaries of Murphy's law, but hey.
[Pruneau
My current favourite question, when people's monitors don't come on after they've moved the computer, or got a new one, is "Is there more than one monitor port? Have you tried both?".
:)
They always claim there is only one socket the monitor will plug into, and without fail so far there has been an onboard one, which they are using, and one on a card, which is the one they should be using, and have completely missed
Combination - fun iPhone puzzling
"Rule #10 - The Holy Grail of Tech Support is the reboot"
If you believe this or if you need this, you are running a
POS operating system and its probably from Microsoft.
That this would even be considered a rule by a professional IT
worker is all the proof we need that Bill Gates has caused
more damage than he can ever hope to make up for.
What utter crap.
I was thinking they were talking about "truths about system administrators", not "truths about system administration".
/home
/home
/etc/passwd, /etc/shadow, and /etc/group. (By hand, NOT with passmgmt.) Slaps on setuid bit; tells a nearby secretary to handle new accounts. Usually, said secretary is still dithering over the difference between 'enter' and 'return'; and so, no new accounts are ever created.
/home; mkdir "Bob's home directory" /etc/passwd
Anyway, for the benefit of those who haven't seen this (very old and long, but somewhat entertaining) email that was doing the rounds a while ago... disclaimer: someone else wrote it, and I don't know who.
KNOW YOUR UNIX SYSTEM ADMINISTRATOR - A FIELD GUIDE
There are four major species of Unix sysad:
1) The TECHNICAL THUG. Usually a systems programmer who has been forced into system administration; writes scripts in a polyglot of the Bourne shell, sed, C, awk, perl, and APL.
2) The ADMINISTRATIVE FASCIST. Usually a retentive drone (or rarely, a harridan ex-secretary) who has been forced into system administration.
3) The MANIAC. Usually an aging cracker who discovered that neither the Mossad nor Cuba are willing to pay a living wage for computer espionage. Fell into system administration; occasionally approaches major competitors with indesp schemes.
4) The IDIOT. Usually a cretin, morpohodite, or old COBOL programmer selected to be the system administrator by a committee of cretins, morphodites, and old COBOL programmers.
HOW TO IDENTIFY YOUR SYSTEM ADMINISTRATOR:
-- SITUATION: Low disk space. --
TECHNICAL THUG: Writes a suite of scripts to monitor disk usage, maintain a database of historic disk usage, predict future disk usage via least squares regression analysis, identify users who are more than a standard deviation over the mean, and send mail to the offending parties. Places script in cron. Disk usage does not change, since disk-hogs, by nature, either ignore script-generated mail, or file it away in triplicate.
ADMINISTRATIVE FASCIST: Puts disk usage policy in motd. Uses disk quotas. Allows no exceptions, thus crippling development work. Locks accounts that go over quota.
MANIAC:
# cd
# rm -rf `du -s * | sort -rn | head -1 | awk '{print $2}'`;
IDIOT:
# cd
# cat `du -s * | sort -rn | head -1 | awk '{ printf "%s/*\n", $2}'` | compress
-- SITUATION: Excessive CPU usage. --
TECHNICAL THUG: Writes a suite of scripts to monitor processes, maintain a database of CPU usage, identify processes more than a standard deviation over the norm, and renice offending processes. Places script in cron. Ends up renicing the production database into oblivion, bringing operations to a grinding halt, much to the delight of the xtrek freaks.
ADMINISTRATIVE FASCIST: Puts CPU usage policy in motd. Uses CPU quotas. Locks accounts that go over quota. Allows no exceptions, thus crippling development work, much to the delight of the xtrek freaks.
MANIAC:
# kill -9 `ps -augxww | sort -rn +8 -9 | head -1 | awk '{print $2}'`
IDIOT:
# compress -f `ps -augxww | sort -rn +8 -9 | head -1 | awk '{print $2}'`
-- SITUATION: New account creation. --
TECHNICAL THUG: Writes perl script that creates home directory, copies in incomprehensible default environment, and places entries in
ADMINISTRATIVE FASCIST: Puts new account policy in motd. Since people without accounts cannot read the motd, nobody ever fulfills the bureaucratic requirements; and so, no new accounts are ever created.
MANIAC: "If you're too stupid to break in and create your own account, I don't want you on the system. We've got too many goddamn sh*t-for-brains a**holes on this box anyway."
IDIOT:
# cd
# echo "Bob Simon:gandalf:0:0::/dev/tty:compress -f" >
-- SITUATION: Root disk fails. --
TECHNICAL THUG: Rep
Seriously, anthropomorphizing machines is a powerful technique. It gives you an approximate but effective mental model of a complex system. "Primitive" cultures are not dumb when they attribute personalities to objects. Our brains are wired to use personality to predict complex behaviour.
My Mother had no technical skills or knowlege - but she treated the automobile like a pet. She was alert to the tiniest change in sound or vibration of the machine, and very often alerted my Dad to problems long before he was aware of anything. One time, driving across country, my Mom said the right front wheel "didn't sound right". We were cruising along at 70, and everything seemed fine. But she insisted, so my Dad pulled over and checked all the tires. No sign of a problem. He pulled the hub cap off the right front wheel - and noticed that the cotter pin had broken! A few more miles and the wheel would have come off. My Dad panicked, since we didn't have any cotter pins in his repair kit. But my Mom dug in her purse and offered a bobby pin. My Dad didn't want to use it, because it was the wrong kind of metal and would break easily. My Mom said she had more, so he put it in. That bobby pin took us another 5000 miles.
My Dad does all his own work on his cars - at least he did until he ruined the valves on his Honda Accord a few years ago. Now he lets a mechanic do some stuff for him. I learned to be in tune with machines from my Mom, and learned to fix them from my Dad. When designing file system software back in the '70s, the rhythmic sounds of the disk access mechanism was my best feedback on its efficiency. Those were the days of 14" disk platters.
1. Adobe products and antivirus cause the most software problems, but you cannot live without either.
2. Most computer hardware problems are the result of sticky rolls, janitors cleaning, computers being accidently kicked, or power failures. In that order.
3. When calling HP or Dell about anything other than servers, you will get bad tech support.
4. Three year warranties on individual PCs do not matter. On a system with dozens of computers, they pay for themselves.
5. There will always be a lower price. Get over it.
6. Phones cannot fail. Five nines of reliability is not good enough.
7. Documented organization of the network and supplies will save you more time than the knowledge a thousand certifications brings (which isn't that much anyways).
8. Researching and backing up information before beginning a project is the sign of a professional. So is spelling.
9. Soft operating expenses are always more expensive than hard operating expenses.
10. When working on a project, document everything. It is almost never needed, but if your coworkers know you have it, they will not try to screw you.
Hoist Number One and Number Six.
Cynical but true.
Do not befriend the users. Do not tell them what is actually going wrong. Never accept blame. Do not rush to complete requests.
Here are the reasons why:
If you befriend them, they will cease to be able to do the simplest thing without your help. This is fine if they're hot, but not if they're not.
If you tell them what is actually wrong, they will get it more wrong when they report it up the line, and you will be blamed for something. Instead tell the users something hugely general that will fit into that comfortable place in their minds.
If you accept blame, users will view this as a sign of weakness, and assign blame the next time, without waiting for you to volunteer.
If you rush to complete non-critical, non-it projects, users will use this as a performace benchmark, and you'll be forced to complete all of their projects first to avoid the appearance of slacking off, in the course of this you will have to ignore critical maintenance that can get you in real trouble later.
ad logicam Claiming a proposition is false because it was presented as the conclusion of a fallacious argument.
1) Document everything. I've had coworkers who thought being asked to document their processes and procedures meant they would soon be canned. If you document your processes, you can pass them off to other team members when you tire of them, so you can move on to bigger and better problems. 2) Talk out loud when working with users. It was a habit I got into while doing field service. Explain what you are doing while you are doing it and a) the users may learn something, and b) it lets them know you're not related to Nick Burns (SNL). By taking the time to explain things (knowing most users won't understand any of it to begin with), the users will know that you are interested (okay, some may feign interest) in their problems and the resolution. After doing this for years, I have seen many technophobic users start to come around to where they will actually try to fix a problem themselves before calling the help desk. 3) Problem always happen on Fridays just before quitting time.
I use irony whenever I can, but my shirts are still wrinkled...
I went through three power supplies before I discovered the fact that I actually had a power cable that was going bad.
I used to work for a company that developed a very highly customized package for our customers, put it on the *NIX of their choice, and installed it in their data centers. Although based in the US, one customer, whose site I was working on, was in Basingstoke, England.
The client was (and probably still is) a hard-core Big Blue shop, so the *NIX of choice was AIX, running on a two-piece RS6K machine. One piece was the server itself, and the other piece was an 8-disc SSA drive tower.
The drive tower had three power supplies, allegedly for redundancy, but these, in turn, were connected together via a three-way IEC Y cable. This then plugged into a normal IEC cable that then had the monster 13A plug they use in the UK on the other end. (If you haven't seen one of these, they're huge. If we used these in the US, we'd probably rate them for 50A).
The plug had a fuse in it.
I'll say that again, because this is important, but not something that you typically see outside the UK: The plug had a fuse in it.
After we hardware guys left the customer site, and left it in the capable hands of our software guys, we got a frantic call from the software guys that the discs had "just disappeared from the system".
To make a long story short (if it's not too late for that), the fuse in the plug had blown, thus killing power to all three power supplies, in turn killing power to the discs. Once we figured that out, we had our software guys get the customer's IT guy on the phone, he got out two more IEC to 13A cords and a fuse, and the problem was fixed in ten minutes plus reboot time. The Y cable was relegated to the scrap heap.
www.wavefront-av.com
I'm very good at what I do, not even 5% of my peers are as good as I am (admittedly I work on the helpdesk so the bar isn't necessarily too high in some cases). I know my stuff in a lot of detail (I'm a geek) and am usually the most intelligent person in any room I'm in. These are plain simple facts and even my employer wouldn't deny them, I am however (despite the seeming arrogance of the preceding statements) willing to learn and depressingly aware that I don't know everything (I generally find the more I learn the more I realise I don't know). I treat users as human beings and enjoy the problem solving parts of my job. Ok, so repairing an oversized .pst for the nth time is less than fun but I usually get all the difficult stuff no-one else knows what to do with. Fortunately my employer recognises this and my pay slip is suitably well padded. Getting someone with my level of knowledge who actually enjoys helpdesk work is worth the extra shekels to them, it means the systems and comms teams can get on with taking things forward while I make sure the current setup keeps ticking over.
Most users are perfectly capable of firing up a command line and following instructions if they're given clearly and unambiguously. Obviously you want to keep it simple (ipconfig, set etc) but it's the quickest way to get their IP address (assuming you don't have central login histories built in to your call logging software or it's not working).
This one makes me shudder. Repairing the damage done by those who went before me and rebuilding the permission structures ("user in the global, global in the local", it's not rocket science for crying out loud!) once the directory structure is sane (and incidentally only allowing list access to the root file share) has eaten up more of my time than I want to even think about.
And don't forget that accurate backup reporting is just as critical. Finding out the backup has failed the last 2 weeks and the software didn't report it is not something you ever want to go through (fortunately we also do manual checks). This is a sore point with me, one of those head->wall things I don't want to talk about.
This is the core of my job. I have to balance network integrity and security with user needs, frequently the "obvious" (to the user) solution is not acceptable in some way or other (wireless for example is an absolute no go area on our network) so I have to work out one that is. I'm here to enable users to achieve their tasks and goals, not to get in the way.
See above, it just doesn't happen on anything connected to the core network.
"Linux may be powerful, but the command prompt and configuration files and filesystem obscurity will just as soon get you a pink slip if something goes wrong and no one knows how to fix it but yourself."
Contrast that with:"# 9 - Know Your Needs:
"This one could also be called 'Learn Linux.'...When you want a spam solution, before looking at $5,000 servers and huge licensing fees for Windows Server software take a look at one of those old 'junk' PCs you have in the closet, download your favorite distro of Linux, and install procmail and spamassassin. You (and your budget) will thank me later."
Ok...., so which is it?
"Anyone that has ever gotten an idea based on any of my work and done something better with it-good for you."--J.Carmack
The first thing I do in every single problem is 'attempt to replicate it'. (You know that joke about the computer scientist and the brake failure? So true.)
I will admit that often times it's pointless, you technically should probably recheck your work and then try again, but it always amazes me when someone has a problem and then goes and involves someone else before trying it a few more times.
The next step is 'change a few minor things and try again'. Again, it always trips me up when 'The printer doesn't work' and no one's tried to reseat the cable, or turn it off and back on. I do that shit automatically.
The problem isn't people who think like this, it is school systems and offices where no one understands technology, and thus grants technology some sort of mystical 'Don't ever do anything unless you know exactly what you're doing' field.
These people get exposed to this attitude for a decade and they are scared to death to push any button they do not understand, even if it's obviously the right one. You've basically turned their problem solving ability off WRT to those things.
You sit them down in their car, and if it fails to start, they try again, and be able to tell you if it's a dead battery or no fuel. You hand them their cellphone on the wrong screen and they're sunk.
Most people on here have not been exposed to, or ignored when exposed to, that field. And thus we can do trivial things without even realizing it that solves this problem. Don't congratulate yourself too much, however, because a man from the 1500s could do basically the same thing once he understood the concept, just like I can figure out basic problems with a water pump...our problem solving ability is turn on.
As for why this field exists? The basic principle that people do not know how incompetant they are. Somewhere, at every institution, there really is someone who should not, under any circumstances, touch any computer in any way, because they will probably cause a nuclear meltdown. (I don't understand it! There wasn't any nuclear material in the truck!) At some point, they did touch one, and from them, everyone has learned to never touch a computer.
And this is why it is okay to kill incompetant people.
Also it's why you should never start drinking in the middle of a post.
If corporations are people, aren't stockholders guilty of slavery?
We had a Dell tech come out to replace a RAID card, and decide to replace the whole server motherboard - with a different model. He also helpfully rearranged all the PCI cards, just to ensure that Win2k Server wouldn't know where to find anything. He then powered the system up, couldn't get the server working, and - on the client's advice - called me. For some reason, he asked me what an IP address was (my first thought was he wanted to know the IP address for the server, but giving him that just made him ask again). I dashed to the client's site, and found Win2k Server not talking to the network because it could no longer find any working network cards (the one that the OS still recognized didn't have a cable in it), and the server bluescreening every few minutes. Amazingly enough (I wish I knew what the client said to get them to agree!), Dell actually agreed to pay 50% of my fees to get the server working again!
Lead developer, http://wisptools.net