Ask Slashdot: Getting a Grip On an Inherited IT Mess?
First time accepted submitter bushx writes "A little over a month ago, I assumed the position of programmer and sole IT personnel at a thriving e-commerce company. All the documentation I have is of my own creation, as I've spent most of my time reverse-engineering the systems in place just so I can understand how everything works together. Since I've started, I've done everything from network and phone upgrades to database maintenance with Perl, and thus far it's been immensely rewarding. But as I dig deeper, I notice the alarming number of band-aids applied by my predecessor, and it seems like the entire company's infrastructure is just a few problems away from a total meltdown. The big question now is, how do I, as a single person, effectively audit the network, servers, databases, backups, and formulate a long-term plan that can be implemented by one person? Is it possible? Where do I begin?"
say goodbye to your life for the next year. hope you're getting paid to mislay it....
Brief your management on the situation. Explain what condition things are in and what is needed to get them into a manageable state. Give them a list of projects / tasks that you have to deal with and get them to prioritize.
blindly antisocialist = antisocial
You need to document it and get management to approve spending money.
I'll bet you $100.00 the band-aids are there because management refuses to spend money on Infrastructure and its' why it is a mess and the guy there beforehand has left.
99% of the time a hosed IT infrastructure is because management refused to spend any money so it had to be half assed.
Do not look at laser with remaining good eye.
Facts:
1. The job has lasted for 1 month so far.
2. The e-commerce company is 'thriving' apparently'.
3. All of the systems have been "reverse engineered" in that 1 month.
4. All of the documents are written in that 1 month.
5. In 1 months there have been: network and phone upgrades and database maintenance with Perl and it all has been 'immensely rewarding'.
6. The entire infrastructure is 'a few problems away from a total meltdown'.
7. Single person IT operation to do everything.
Question: is this for real? What's the size of the company and what's the budget?
You can't handle the truth.
No!
This is actually the kind of career building stuff one should leap at. What would you rather say in an interview for your next job: ..
- I took this system that was falling apart and made it run like clockwork.. downtime and issue frequency went from "it's down again" to "been up all year"
- Yeah it was pretty good when I got there, and I maintained the status quo
My thoughts on original question:
First step is comprehension. You can’t fix what you don’t know you have/need. Identify the key components of your system. Then for each key component, break it down to it’s parts and dependencies. Then break each one of those out, and so on, until you have a pretty damn good idea of what you have.
Next part is assessment. For each component you’ve identified, what is its current state.
And then it’s time to do triage. Prioritize stuff by largest potential impact.
And finally carry out your well thought out pla.. ok, can't say that one with a straight face. Basically try to fix stuff when you can, between putting out the daily fires.
I worked in this environment for one year as to not tarnish my resume. I toughed out the last 4 months absolutely burned out and bitter. You cannot communicate to management that outages and issues aren't your fault; they're adopted. When you fix things, you'll inevidably miss something (I did because of the pace, not dictated by me). Get out. It's not worth the challenge to get proper budgeting to get the right tools in place or the organization as a whole wouldn't let things get how they are in the first place. The business model I came from is failing. If you're good, there are better paying, better rewarding, less "heart and soul" companies out there. You're doing basically startup work for at will employment pay.
I'd amend that to a big "maybe" for sticking around.
All of what you said (and the initial reaction to quit in the GP) all hinges on the root cause of the mess. If it's a result of the predecessor not doing things correctly and flying by the seat of his pants, you're correct at jumping at the opportunity. However, if it's caused by management screwing IT every chance they get with poor timelines, lack of funding, no foresight, and so on, run like hell.
It is probably a combination of the two. Because MGMT always assumes IT can do something with very little, and often the Impossible with Nothing.
We are skilled (most of us anyway) problem solvers, and they rely upon that to function. I hate to say it, but to the original question should be answered this way: HIRE outside consultants to evaluate your system(s), and give you a hard copy report on their findings that you can present to MGMT.
If the situation is as I believe, it is worse than he even suspects. He needs more help than he can do by himself, to get ahead of the curve.
Agent K: A *person* is smart. People are dumb, stupid, panicky animals, and you know it.