Slashdot Mirror


Ask Slashdot: Getting a Grip On an Inherited IT Mess?

First time accepted submitter bushx writes "A little over a month ago, I assumed the position of programmer and sole IT personnel at a thriving e-commerce company. All the documentation I have is of my own creation, as I've spent most of my time reverse-engineering the systems in place just so I can understand how everything works together. Since I've started, I've done everything from network and phone upgrades to database maintenance with Perl, and thus far it's been immensely rewarding. But as I dig deeper, I notice the alarming number of band-aids applied by my predecessor, and it seems like the entire company's infrastructure is just a few problems away from a total meltdown. The big question now is, how do I, as a single person, effectively audit the network, servers, databases, backups, and formulate a long-term plan that can be implemented by one person? Is it possible? Where do I begin?"

26 of 424 comments (clear)

  1. Explaines a lot by p43751 · · Score: 5, Funny

    You work at RIM?

    1. Re:Explaines a lot by Anonymous Coward · · Score: 5, Funny

      So you are asking him if he got a RIM job?

    2. Re:Explaines a lot by youn · · Score: 5, Funny

      http://steve.jobs/ does not seem to be operational either :).... I will probably get marked as troll by apple fanboys... still funny :p

      --
      Never antropomorphize computers, they do not like that :p
  2. methodically and late into the night by sentimental.bryan · · Score: 5, Insightful

    say goodbye to your life for the next year. hope you're getting paid to mislay it....

    1. Re:methodically and late into the night by DrgnDancer · · Score: 5, Insightful

      My guess is he's not... I'm immediately concerned by: "position of programmer and sole IT personnel" and: "thriving e-commerce company" together in the same sentence. The fact that this does not appear to be a small mom and pop with two or three servers making up the "e-presence" adds fuel to the fire. I'm getting the image of a fairly large company that relies heavily on it's web and e-commerce presence. And has one guy to take care of that. What happens when he's on vacation or sick and a server dies? What happens when the website has an issue and then *anything* else goes wrong?

      There's no bullpen here, if anything, anything at all, breaks there's only one guy to fix it. Day or night. If two things break you're already triaging. Surely a "thriving" company can afford a backup to what is pretty clearly a business critical unit?

      --
      I don't need a million points of light, just two points of multi-mode fiber and a 10 Gig-E router.
    2. Re:methodically and late into the night by Lumpy · · Score: 5, Insightful

      "if they were half-way intelligent, you'll find an external email address in some source somewhere. Ping them and ask what really happened."

      I have done this before.. the response was...

      "Find another job and run, run as fast as you can. Oh and trust no one."

      --
      Do not look at laser with remaining good eye.
    3. Re:methodically and late into the night by gmack · · Score: 5, Insightful

      That's assuming the predecessor wasn't the problem. I have learned over the years that there are far too many tech types to prefer to be the only one that does a particular task and will make any excuse to management to make sure things stay that way. When these lone wolf types happen to not be as competent as they pretend to be they tend to themselves into too deep a hole so they either get fired or quit in frustration but when you talk to them it will always be some other person's fault.

      I'm not saying management isn't at fault, they very well could be but don't assume that right off. The first step is to try and get a read on how good the predecessor was at their job otherwise he can get very misleading info.

    4. Re:methodically and late into the night by xdroop · · Score: 5, Insightful

      What happens when he's on vacation or sick and a server dies? What happens when the website has an issue and then *anything* else goes wrong?

      Oh, that's easy:

      • He gets called in from being on vacation or sick;
      • he gets to work uncompensated time to fix the problem;
      • if he fails to either respond to the call OR fails to fix the problem, he gets fired;
      • if he succeeds in fixing the problem, he gets threatened with termination should something else fail while he's "unavailable".

      In fact, I'd lay odds that's how the vacancy occurred.

      --
      you should read everything on the internet as if it had "but I'm probably talking out of my ass" appended to it.
    5. Re:methodically and late into the night by Anonymous Coward · · Score: 5, Insightful

      Small operations like his are common. I'd guess he is a reasonably capable person. Where his world and yours differ, is that he's a jack of all trades (master of none)... because that's what that kind of business requires.

      Yes, in a larger company, you'd hire an Exchange pro, an AD pro, a networking pro, a programmer or two and a couple techs that are slightly more generalized guys to manage backups, the server room and help desk. The unfortunate truth is that specialized individuals are rarely any good outside their specialty... which is unhelpful to a small business that can't afford a stable full of tech talent.

      I know, as I've been this guy. It's brutal work but can be pretty satisfying. Every day your work is different. But you're never an expert at one particular thing and you're never paid like someone who specialized early on.

  3. 1 suggestions by Anonymous Coward · · Score: 5, Funny

    start drinking

    1. Re:1 suggestions by Anonymous Coward · · Score: 5, Informative

      Heavily

  4. "A little over a month ago I assumed the position" by Anonymous Coward · · Score: 5, Funny

    Dude, that is to easy. There are serious wiseacres on this board.

  5. Escalate by sociocapitalist · · Score: 5, Insightful

    Brief your management on the situation. Explain what condition things are in and what is needed to get them into a manageable state. Give them a list of projects / tasks that you have to deal with and get them to prioritize.

    --
    blindly antisocialist = antisocial
  6. Get management buy in... by Lumpy · · Score: 5, Insightful

    You need to document it and get management to approve spending money.

    I'll bet you $100.00 the band-aids are there because management refuses to spend money on Infrastructure and its' why it is a mess and the guy there beforehand has left.

    99% of the time a hosed IT infrastructure is because management refused to spend any money so it had to be half assed.

    --
    Do not look at laser with remaining good eye.
  7. this is a majorly funny story by roman_mir · · Score: 5, Insightful

    Facts:

    1. The job has lasted for 1 month so far.
    2. The e-commerce company is 'thriving' apparently'.
    3. All of the systems have been "reverse engineered" in that 1 month.
    4. All of the documents are written in that 1 month.
    5. In 1 months there have been: network and phone upgrades and database maintenance with Perl and it all has been 'immensely rewarding'.
    6. The entire infrastructure is 'a few problems away from a total meltdown'.
    7. Single person IT operation to do everything.

    Question: is this for real? What's the size of the company and what's the budget?

    1. Re:this is a majorly funny story by 1u3hr · · Score: 5, Insightful

      Question: is this for real?

      It's an "Ask Slashdot". They're as real as "Letters to Penthouse". Both carefully crafted to create a fantasy situation to excite readers. Read them if the subject is something you're interested in, but don't waste your time giving advice..

  8. Re:Quit by Anonymous Coward · · Score: 5, Insightful

    No!

    This is actually the kind of career building stuff one should leap at. What would you rather say in an interview for your next job:
    - I took this system that was falling apart and made it run like clockwork.. downtime and issue frequency went from "it's down again" to "been up all year" ..
    - Yeah it was pretty good when I got there, and I maintained the status quo

    My thoughts on original question:

    First step is comprehension. You can’t fix what you don’t know you have/need. Identify the key components of your system. Then for each key component, break it down to it’s parts and dependencies. Then break each one of those out, and so on, until you have a pretty damn good idea of what you have.

    Next part is assessment. For each component you’ve identified, what is its current state.

    And then it’s time to do triage. Prioritize stuff by largest potential impact.

    And finally carry out your well thought out pla.. ok, can't say that one with a straight face. Basically try to fix stuff when you can, between putting out the daily fires.

  9. It's the Eye of the Tiger! by anom · · Score: 5, Funny

    Just buy a few cases of your energy drink of choice and put Eye of the Tiger on repeat until you've got it all fixed.

    I believe in you.

  10. What, where, why... by ScottyLad · · Score: 5, Informative

    I've spent the best part of my career undertaking tasks like this (as an external consultant), with my average time on an assignment lasting somewhere between 18 months and 3 years.

    My aim on every project is to make myself obsolete - in that I try to get documentation up to a point where a suitably qualified individual could come in, read the documentation, and work the rest out for themselves.

    My primary objectives are to implement some form of inventory control to document the what / where / why...

    • What - What have you got (servers, software, services, contracts, operating systems, databases, users)
    • Where - Where is it - where are your servers, what machine is this software licence running on?
    • Why - What is the Business Justification for this service - what is the Business Impact if this database stopped running tomorrow?

    Once you've got to that stage, then you're ready to get in to the real technical details. Remember that you are pitching your documentation to your successor, or to some imaginary "suitably qualified individual", so documenting what a system does and why is a higher priority than commenting every line of code.

    It is possible to do with one person, depending on the size of the organisation, it can be particularly rewarding to do on your own - in a small business you often find some of the users have a good understanding of some of the systems, or are keen to learn.

    You stated in your post that you've assumed the role of programmer and sole IT personnel - which means you need to learn to think like a manager as well as a techie (which is harder than most people imagine!). Once you learn to focus on the business priorities, you'll understand where to begin with the technical detail, and what level of documentation is required.

    --
    Philosopher (n) - a wise person who is calm and rational; someone who lives a life of reason with equanimity
  11. Start over... slowly by rabenja · · Score: 5, Insightful
    I was in much the same position 12 years ago at this company. I am now CIO with 7 people on my team with several business partners to help manage the infrastructure. My advice for what it is worth:
    • - Take time every day to assess and analyze the bigger picture before allowing yourself to get drawn into the details.
    • - Look at the entire system from a risk mitigation perspective. What areas are most likely to cause "meltdown". Spend the most effort there.
    • - What are incremental changes that can be made that improve the overall risk picture? Focus on the biggest bang for the buck.
    • - Defer anything that works well enough for the time being.
    • - Avoid big bang solutions unless they can be contained and tested well, with the capability of rolling back.
    • - Get help where necessary.
  12. Re:Quit by Anonymous Coward · · Score: 5, Insightful

    I worked in this environment for one year as to not tarnish my resume. I toughed out the last 4 months absolutely burned out and bitter. You cannot communicate to management that outages and issues aren't your fault; they're adopted. When you fix things, you'll inevidably miss something (I did because of the pace, not dictated by me). Get out. It's not worth the challenge to get proper budgeting to get the right tools in place or the organization as a whole wouldn't let things get how they are in the first place. The business model I came from is failing. If you're good, there are better paying, better rewarding, less "heart and soul" companies out there. You're doing basically startup work for at will employment pay.

  13. Re:Quit by The+Moof · · Score: 5, Insightful

    I'd amend that to a big "maybe" for sticking around.

    All of what you said (and the initial reaction to quit in the GP) all hinges on the root cause of the mess. If it's a result of the predecessor not doing things correctly and flying by the seat of his pants, you're correct at jumping at the opportunity. However, if it's caused by management screwing IT every chance they get with poor timelines, lack of funding, no foresight, and so on, run like hell.

  14. Re:Getting a Grip by Atanamis · · Score: 5, Informative

    agreed. As soon as I saw this was an IT department of one, I could tell the exact amount of care that management has on getting things like this corrected. These things are in place because management does not want to provide what is needed. If they only want to pay for band-aids, that is all they will have.

    This isn't necessarily the case though. I have a friend who took over IT at a small business. When he walked in they were using pirated software and their IT was a complete mess. After he put in hours to get it fixed up (with personal support from the owner), they ended up offering him an executive position with a massive pay increase. Some small shops with one IT guy really just don't know what they are doing, and haven't had a person in the job to tell them what is being done wrong. Your advice is still good though. A person in that situation needs to test whether they have management support to do things better. If so, it can turn into a career making opportunity to turn things around. If you can't get the management on your side though, it very well could be time to start looking for another job with more supportive management.

    --
    Atanamis
  15. Re:Quit by v1 · · Score: 5, Informative

    Yep. Hop into the waders and get to work. It can be a very rewarding experience turning a steaming pile back into a smooth running good looking machine.

    To add to the above, document everything. Though it sounds like you're already doing that. Make sure it's documentation that works for anyone not just you. Don't take anything for granted. Automate whatever you can, including problem detection and notification. (save yourself from having to check things daily or weekly, have it shoot you an email or something if a common issue crops up again)

    Make sure your employer fully understands the situation you and they are in, so they don't expect you to be doing improvements and striking things off their sore to-do-list that they were probably hoping you'd tackle the day you started. Get them a timeline as soon as you get something of a grip on the situation, tell them where you're going to be spending your time to start with, and the reasons why it's essential and going to delay their getting their bells and whistles and visible bang-for-the-buck of hiring you. Otherwise they may think you're just sitting on your butt because they're seeing no tangible benefits.

    If you've got a LOT of things that need to be fixed, things that can be done by closer to trained-monkey level, consider getting a temp assistant to help you dig out. Someone to run around and reimage machines, fix networks, repair stations, do RMAs, etc while you pull up your sleeves and unhack the servers. But if they're not in that big of a hurry this may not be appropriate.

    Good luck with it, sounds like fun actually, a challenge at the least.

    --
    I work for the Department of Redundancy Department.
  16. Re:Quit by Archangel+Michael · · Score: 5, Insightful

    It is probably a combination of the two. Because MGMT always assumes IT can do something with very little, and often the Impossible with Nothing.

    We are skilled (most of us anyway) problem solvers, and they rely upon that to function. I hate to say it, but to the original question should be answered this way: HIRE outside consultants to evaluate your system(s), and give you a hard copy report on their findings that you can present to MGMT.

    If the situation is as I believe, it is worse than he even suspects. He needs more help than he can do by himself, to get ahead of the curve.

    --
    Agent K: A *person* is smart. People are dumb, stupid, panicky animals, and you know it.
  17. In this situation currently. by YojimboJango · · Score: 5, Interesting

    The number one best thing you can ever do in your situation is ask your bosses what they think the system should be doing.

    Step 1: All the squirrelly business logic and the rationale behind each system you have to maintain should have a plain text description. You have to know the 'Why' before the mess of band aids that is the 'How' will ever make sense. Have your boss (or his secretary, or whoever) document it and get it to you. Do NOT do this step yourself. Repeat do NOT perform this step.

    Step 2: Put out fires till someone not you finishes step 1. Start making backups of every last scrap of data you can get your grubby hands on.

    Step 3: Once step 1 is done compare it to the mess. Note where the realities that are in your bosses head diverge from what is actually happening. Your job is to now create a detailed functional spec that takes what your boss says, and expand on it with what is really happening. Try to include worst case scenarios and document them as intended features.

    Step 4: Have your boss and sales and marketing, and every other top level manager sign off on it. This will not happen. No two managers in your company will fully agree on what the current system is actually doing. Your goal is to figure out what sales and marketing are telling your users that your products do. Do not disregard this step or it will come back and bite you very hard.

    Step 5: Once every department actually agrees on what your job really is, you will be well equipped to start the long process of fixing things. Again make lots and lots of backups. Management will sign off on step 4, then you'll fix a gaping security hole, and some customer somewhere will throw a raging fit because sales promised that they'd be able to get admin access to your databases or something ridiculous.

    Step 6: Don't be an ass. When step 5 inevitably happens, explain the miss-step in communication graciously, and roll back. If you pulled not being an ass off properly, you now have a great platform to explain to management why X was a bad idea, and present an idea to fix it.

    I'm a grizzled vet to your situation. If someone would've told me what I just told you when I started out, there would have been a lot less headache and stress. Hang in there, it can be an intensely rewarding experience.