Ask Slashdot: Getting a Grip On an Inherited IT Mess?
First time accepted submitter bushx writes "A little over a month ago, I assumed the position of programmer and sole IT personnel at a thriving e-commerce company. All the documentation I have is of my own creation, as I've spent most of my time reverse-engineering the systems in place just so I can understand how everything works together. Since I've started, I've done everything from network and phone upgrades to database maintenance with Perl, and thus far it's been immensely rewarding. But as I dig deeper, I notice the alarming number of band-aids applied by my predecessor, and it seems like the entire company's infrastructure is just a few problems away from a total meltdown. The big question now is, how do I, as a single person, effectively audit the network, servers, databases, backups, and formulate a long-term plan that can be implemented by one person? Is it possible? Where do I begin?"
say goodbye to your life for the next year. hope you're getting paid to mislay it....
Brief your management on the situation. Explain what condition things are in and what is needed to get them into a manageable state. Give them a list of projects / tasks that you have to deal with and get them to prioritize.
blindly antisocialist = antisocial
You need to document it and get management to approve spending money.
I'll bet you $100.00 the band-aids are there because management refuses to spend money on Infrastructure and its' why it is a mess and the guy there beforehand has left.
99% of the time a hosed IT infrastructure is because management refused to spend any money so it had to be half assed.
Do not look at laser with remaining good eye.
Facts:
1. The job has lasted for 1 month so far.
2. The e-commerce company is 'thriving' apparently'.
3. All of the systems have been "reverse engineered" in that 1 month.
4. All of the documents are written in that 1 month.
5. In 1 months there have been: network and phone upgrades and database maintenance with Perl and it all has been 'immensely rewarding'.
6. The entire infrastructure is 'a few problems away from a total meltdown'.
7. Single person IT operation to do everything.
Question: is this for real? What's the size of the company and what's the budget?
You can't handle the truth.
No!
This is actually the kind of career building stuff one should leap at. What would you rather say in an interview for your next job: ..
- I took this system that was falling apart and made it run like clockwork.. downtime and issue frequency went from "it's down again" to "been up all year"
- Yeah it was pretty good when I got there, and I maintained the status quo
My thoughts on original question:
First step is comprehension. You can’t fix what you don’t know you have/need. Identify the key components of your system. Then for each key component, break it down to it’s parts and dependencies. Then break each one of those out, and so on, until you have a pretty damn good idea of what you have.
Next part is assessment. For each component you’ve identified, what is its current state.
And then it’s time to do triage. Prioritize stuff by largest potential impact.
And finally carry out your well thought out pla.. ok, can't say that one with a straight face. Basically try to fix stuff when you can, between putting out the daily fires.
"I assumed the position of programmer and sole IT personnel at a thriving e-commerce company."
Wait.... a thriving e-commerce company has one IT person? Am I missing something here...? No wonder everything was band-aided together. They have one person doing everything.
You may want to consider hiring an outside firm to come in and do the audit for you. The last thing you need right now, on top of your daily workload, is to perform an audit. That, and a third party firm creates a sense of objectivity, and would eliminate the "The IT guy wants a new toy" response from the CFO.
No man is an island, But if you take a bunch of dead guys and tie them together, they make a pretty good raft.
I worked in this environment for one year as to not tarnish my resume. I toughed out the last 4 months absolutely burned out and bitter. You cannot communicate to management that outages and issues aren't your fault; they're adopted. When you fix things, you'll inevidably miss something (I did because of the pace, not dictated by me). Get out. It's not worth the challenge to get proper budgeting to get the right tools in place or the organization as a whole wouldn't let things get how they are in the first place. The business model I came from is failing. If you're good, there are better paying, better rewarding, less "heart and soul" companies out there. You're doing basically startup work for at will employment pay.
I'd amend that to a big "maybe" for sticking around.
All of what you said (and the initial reaction to quit in the GP) all hinges on the root cause of the mess. If it's a result of the predecessor not doing things correctly and flying by the seat of his pants, you're correct at jumping at the opportunity. However, if it's caused by management screwing IT every chance they get with poor timelines, lack of funding, no foresight, and so on, run like hell.
It is probably a combination of the two. Because MGMT always assumes IT can do something with very little, and often the Impossible with Nothing.
We are skilled (most of us anyway) problem solvers, and they rely upon that to function. I hate to say it, but to the original question should be answered this way: HIRE outside consultants to evaluate your system(s), and give you a hard copy report on their findings that you can present to MGMT.
If the situation is as I believe, it is worse than he even suspects. He needs more help than he can do by himself, to get ahead of the curve.
Agent K: A *person* is smart. People are dumb, stupid, panicky animals, and you know it.
This.
I'd be willing to bet a year's pay that the previous guy wasn't straight-up incompetent. He was probably relatively skilled, and doing the best he could with the resources at his disposal. Which were probably not actually the resources he needed.
Odds are good that there's a reason why the place is in the condition it is now.
Odds are good that there's a reason why the last guy isn't there anymore.
Odds are good that you're going to need more than one guy in IT to get it all straightened-out.
"Work is the curse of the drinking classes." -Oscar Wilde
Oh god yes do this.
If your bosses will sign off on getting a second opinion, great, stick around and fix stuff. If they don't even want to know that it's screwed up, get out as soon as you can.
Just be very careful when selecting who you'll bring in to do the audit, and be very clear that if anyone is brought on to help fix the problems, it absolutely will not be the same as the evaluators. Otherwise you're essentially handing them a blank check to say whatever they feel like is wrong, and fix it any way they want.
My suggestion is to generally avoid letting contractors do more than consult with you on a project--they know very well how to set things up so that it's easy for them to work on in the future, and are generally not very good at making the stuff actually fit in well with your business processes.
<xml><I><am><so><damn>Web 2.0</damn></so></am></I></xml>
Completely agree. Perhaps the previous guy didn't take the time to inform the management of what was required to do the job properly, or didn't know himself, or was more interested in painting himself as indispensable than doing the right thing. First things first, if this is genuinely a thriving e-commerce company then their website is their number one priority and their fulfilment systems are the number two priority, phones are number 3 with everything else taking a back seat - and they REALLY need to get a second employee. If you are ill, on holiday, or, deity forbid, something happens to you, then they need someone else who can step in. If their infrastructure is as shot as you suspect then you're going to need a second brain to sort it all out and help you implement it.
You must make sure that backups are being taken and are robust. You need a disaster recovery plan. You need both short term and long term plans to scale the infrastructure as the business grows and reactively if there's a sudden growth spurt. You need to know where the next bottleneck in the system is and come up with a plan to fix it. Do you have an adequate handle on monitoring traffic to the site from when they first land through to placing an order? Do management have the stats required to make informed decisions about the business? Management will also need to be aware of when IT will need extra funds as mapped against their own sales growth targets.
Once all of the above is sorted, and decent management allowing (and presuming this isn't something that is already being taken care of), you need to start suggesting to management the skillsets of people and / or contractors and / or agencies that need to be brought in to proactively grow the business. Be it SEO, PPC, UX, new features, etc. whatever it is, you have the opportunity to help the business understand it all and be instrumental in their success.