Top 10 System Administrator Truths
Vo0k writes "What are your top ten system administrator truths? We all know them already, but it's still fun re-telling them. Stuff like "90% of all hardware-related problems come from loose connectors", even though you already know it's true, may save you from replacing the "faulty" motherboard if you recall it at the right time."
PEBKAC
--Keeping the flame wars alive, one post at a time
from bending them around and whatnot, they develop breaks that can get pushed back together. This is what causes the problem to be intermittent. The cable 'is' bad, not going bad. People need to be more careful in wrapping their cords up. There should be a little bit of slack in the loops or else the slightest bit of pressure will cause them to develop a break.
Rule 1. They lie. End users often tell you what they think you want to hear. When asking a question you should use terms like. What does it say? vs Does it say this?
Rule 2. They don't know they are lying.
Rule 3. Sometimes they are telling the truth. Yes sometimes what you think is impossible really is happening or looks like it is happening.
See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.
Never put the screws back in the case until you've tested your new hardware is working.
It's a Unix system - I know this.
HPs Jetdirect cards have a pretty solid reputation of failing every few years
Is this really the case? We had several JetDirect enabled PCs at my former place of work and almost none of them had a card failure. We even had a few extra cards just in case. Several of the printers were actually quite old even. The biggest problem we had was with only HP-5P (I think that is the number). Some users departments did not have the money to replace those crappy old printers. On a bit of an aside, we had several JetDirect "boxes" (the external box that connected the printer port to ethernet) that were working great. I believe most everyone in the IT staff had one at home for their printers.
No One Ever Got Fired For Buying Microsoft.
Not really true. There are some shops so enamored with Novell (mostly because of bosses stuck in the stoneage) that the idea of purchasing Exchange or using a full out ActiveDirectory system with a Windows only network storage share were unheard of. I once again reference my previous job.
Not too bad of a list overall. Most of the items are right, and it is quite true. To be honest, the places I have worked there were really only a handful of problem employees, and most of them got handled directly by our SysAdmin or the head of IT because no wanted to worry about what lie they may come up with about the work we were doing.
"Some days you just can't get rid of a bomb."
And the corollary: never make an irreversible change unless all of the reversible changes have been tried and ruled out.
I know this was said as a joke, but I see this a lot amongst the geek community, the attitude that users just don't know what they are doing, and that is why they can't make anything work.
Doing some GUI consultant work and writing a few users manuals for some pretty complex software has taught me one thing: Most user error is the fault of crappy software. A good setup (hardware or software) should be easy to use given the users.
Now, obviously it is all about knowing the audience. If you are writing an application for use by other software engineers versus people living in an assisted living home, well, that makes a difference, and you certainly can't cater to all people (for example the guy who writes code for a living but can't setup his own email at home).
The bottom line is, as much as it displeases us, not everyone is a geek. Not everyone cares about the latest firmware for their router, the latest patch for Call of Duty 2, or how to make a projection TV from an old overhead projector and a laptop from eBay. Our job, as geeks, is not to show everyone why they SHOULD care, but rather to make it easy for those who don't care to still do what they need to do.
Just a few minutes ago I got an email forwarded to me from a "stupid" user who couldn't figure out how to perform what to me seems like a simple task in some software my team wrote. We emailed him the directions, even though they were very clearly stated in the manual that I wrote, but I took it one step further. I submitted a feature request in our bug-tracking database to put a message near where what he was trying to do to explain why that option is grayed out.
Anyone can write software or setup hardware that has tons of geek features that we all like, but it takes a lot more effort to make the setup actually usable to the target users.
One of my big truths, set standards!
I've worked in two kinds of places, ones where they set (and stick to) standards and ones that don't. Every place that doesn't use or doesn't stick to standards has always been an experience in wasted time, confusion, and lots of bugs. Those that do can seem like you're always being nagged but in the end you find things work as expected, code is far easier to manage (especially when it is someone else's), and you aren't always having to reinvent the wheel (i.e. figuring out how to fix a subtle bug again because the solution was never written down the first time).
It sounds simple but it takes discipline at all levels. Even something as documenting what you did afterwards and putting it in an orderly file system can make a huge difference but how many people bother to do it? Managers and fellow developers have to crack the whip and keep people from trying to cut corners.
Standards should be open to some change and can be bent but there has to be a very good defendable reason for it.
It is by the juice of the coffee bean that thoughts acquire speed, the teeth acquire stains. The stains become a warning
There is more that goes to that. Do not be afraid to tell upper management to get the hell out of the server room.
We had a problem, SQL was performing poorly a typical query on the machine that took 50 minutes was taking 2.5 hours and was sometimes failing. We instantly started looking at data and possible database corruption, the VP of Operations came down and started "directing us" we politely ignored and continued down our path. He then ordered us to rip the heart out of the SQL server, Remove 4 processors, remove 8 gig of ram, downgrade from Enterprise to standard and only 2 processors. over and over he kepts telling to do things that were insane because he usedto be a Ops manager in the company and knew what he was talking about.
4 days later and about 80 hours of wasted overtime we carefully rebuilt the server BACK to a last known good from a backup before the mess and then discoverd that Oh! there was a DATABASE DATA PROBLEM!
If someone start on a wild chase changing things wildly, I do not care who they are, tell them to piss off and please stand behind the glass, Or better yet, do that nicely by getting everyone inclusing the vendor to agree that what they want to do is not the right thing.... Ganging up on them typically works.
So the parent is 1000% correct. Not only is the solution typically simpler than you think but is usually the one that makes the most sense.
if your SQL server suddenly starts acting up after 2 years of good operation, there is almost no chance that ripping it's guts out will help anything.
Do not look at laser with remaining good eye.
Treat users with respect even if they are clearly in the wrong. Don't patronise somebody if they haven't got the first idea about computers: educate, don't insult. I'm not a buddhist but the old karma idea of "what goes around, comes around" seems to play out in the long term. Being patient with somebody who's royally screwed up their computer pays off in six months time when you need them to put your expenses claim through accounts at 5pm on a Friday evening/ notice you standing in the rain by your broken down car/..../
Good Project Managers hear from the developer 5 days, assume delivery in 4 days and promise it to the customers in 3 days.
No, that's a bad project manager... or possibly a bad salesperson.
Good project managers are the other way around: If they hear "5 days" from the developer, they promise it to the customer in 6. This allows a little time for QA testing if the developer gets it done within his 5 days... and allows for a small buffer if the developer doesn't get it done on time.
Have you read the Moderation Guidelines Addendum?
Even if you've been doing this for 20 years. If you are working with another technician, have the grace to treat them like an intelligent human being.
bun-fhuinneog agam!
The Russians have won. They have made the world a cesspool of distrust, greed, fear and hate.
I have a friend living the GeekSquad life. I'd never hire him as he believes in their process to fix lockups:
1. It must be this unsupported software: remove Firefox or any F/OSS.
2. It is a virus, your AV is no good, purchase Norton CoverYourAss v9.6 for $49.95.
3. The AV doesn't perform a deep clean by itself, we can run one for $24.95.
4. You need a bigger hard drive, w recommend Norton Ghost to copy it. $199.95 + $49.95.
5. We should install the drive. $24.95 + $8.95 wrist strap.
6. We should run ghost for you, $19.95.
7. You need USB 2.0 ports for your mouse to run faster, $49.95 plus $24.95 installation.
8. Your hard drive cables are old belt style, you needbthe snappy round cables, $29.95 plus $9.95 installation.
9. Your video board is old, the ATI MegaWow XL is only $199.95.
10. You should probably buy one of our Compaq BusinessPro by HP combinations, you burned your TCP/IP converter with static.
I pop open the discarded PC, replace the processor fan and blow out the case. All is fine - $30.
"Rebooting Causes 90% of Unix problems."
Well that is usually a half truth. Usually when you reboot a Unix system you do it for the following reasons.
1. You screwed up and have no alternative Interface to get in.
2. Your system has been on so long that you want to reboot it to see what whent down without it telling you.
3. You need to had hardware and it isn't hot swapable.
4. The disadvantage of downtime out waighs the time it will take to fix it without rebooting.
5. You lost power for an extended period of time.
6. Management tells you so.
7. Upgrading the OS to a level all services need to be restated.
8. There are many unknown processes and you want to be sure you are not stopping an important job.
9. Other...
But normally because the drives have been spinning for years. Having it Stop and then start again. Put strain on them and causes them to die. Or if the system has enough memory the drive may have died years ago but all the data is paged.
If something is so important that you feel the need to post it on the internet... It probably isn't that important.
I did not say this as a joke, I was surprised it got modded so high. I work at a small service and repair shop, and you'd be surprised how many computers come back within a week or two after leaving the shop because the client did not listen to my suggestions and recommendations. I always tell them, we'd be happier to fix an issue that is caused because you followed our instructions than fixing one because you didn't. Still, they go on, installing file sharing software I did not recommend, ignoring their windows updates, and clicking "yes" or "no" on those bogus system-error messages, as opposed to the red x. And beyond that, we extend the invitation to any client to call us, free of charge, if they're not sure what to do. We're not bastards in here like people at a lot of computer shops, and we're willing to help, for free, if it's not time consuming and we can do it over the phone... but they hardly ever call while they're unsure, but only after they've broken something. I understand that they're not as savvy as us geeks, however, there are a few simple steps that they should follow based on our recommendations. The mechanic tells me to get my oil changed every 2,000 to 3,000 miles, so I listen. The guy at the salt water aquarium store tells me putting an anemonae in a tank is a bad idea, because when it dies (which it will in your little tank) it's going to kill all of your fish, unless you're really lucky... so I avoid the anemonaes. I'm not an expert, so I listen to those who are more knowledgable. Anyway I've talked too much.
australian project gutenberg is better than the original.
God, yes.
"Nothing happens when I check my email."
"Do you get an error message when you try it?"
"There was some dialog on the screen, yeah."
"Grr. What did it say?"
"Oh, I didn't read it"
Aaaarrgggh.
On a 24x7x365 job, I learned the value of walking through the user's work area every weekday morning, first thing.
They started waiting for me to stroll in instead of paging me at night, just to be nice to me.
But the best part was, they thought of me as the guy who keeps the system running, because most of the time that I showed up, the system was running.
My colleagues who only showed up when their systems broke had the reputation "Here comes trouble!"
Taking credit for things going well is essential!
"Learning is not compulsory... neither is survival."
--Dr.W.Edwards Deming
10) Patch Current. Then ask for the unreleased patches. Then ask for development involvement.
9) Patching only works 30% of the time
8) Metalink is like a massive "Magic 8 Ball" that pulls responses from the database. Treat it as such.
7) Tars are the same as 8, except you have a customer service rep reading the 8 Ball.
6) If it generates core files it's the DBA's problem.
5) It's ALWAYS the DBA's fault.
4) RMAN is your friend.
3) You know more about Apache than Oracle does.
2) Oracle won't admit this.
1) Autconfig doesn't.
What if it is just turtles all the way down?
The OSI model works in almost all aspects of computing and not just strictly networking.
Application > Presentation > Session > Transport > Network > Data Link > Physical. This order is actually from layer 7 to 1.
If you had followed the OSI model, you would've found out that the *first* thing to do would be to check the physical connection (aka power cord) and found your problem right away.
...is the result of trying to implement 100% of user requests. Sometimes, telling the user "no, you simply can't have that" is the best way to ensure an application isn't horribly poisoned by thousands of totally irrational, non-intuitive crap "features" each piece of which makes sense only to the person who requested it. Worse, such design-by-committee applications are invariably written interface-first, back-end last with no regard to how to actually make the goddamned thing WORK, much less work efficiently.
I agree, good software should be intuitive, but far better to be proactively engineered to be more intuitive, rather than reactively veneered to feel less unintuitive.
-- I speak only for myself
"Rebooting Solves 90% of Windows problems"
Nope. Rebooting only clears 90% of symptoms, it doesn't necessarily make the problems go away. For example, if you have a webserver that's got a memory leak and that leak takes 72 hours to fill RAM to the point that the system becomes unusable, rebooting clears the symptom (unusable system) but doesn't resolve the problem (bug in the webserver). Too many people think that the reboot fixes the problem, so they don't ever bother finding out what the real problem is.
I can't stress enough how valuable one of these, or some other good LiveCD, can be. If the box is Windows, Linux, whatever, keep one handy. One of these things can be priceless if the thing refuses to boot properly, someone deleted NTLDR, X locks up on runlevel 5, your driver interrupts conflict, a recursive script uses all of the PIDs, or any number of problems. Keep a printout of the boot options for the disk, too, to boot the unbootable.
Never ask dumb questions like that. It embarrasses the user for no good reason. Find a subtle way of getting them to check the power without forcing them to reveal their mistake. Such as:
They'll still learn the lesson - check the power before calling tech support - but now they won't feel so uncomfortable that you were mocking them with your questions.