Slashdot Mirror


Ask Slashdot: Getting a Grip On an Inherited IT Mess?

First time accepted submitter bushx writes "A little over a month ago, I assumed the position of programmer and sole IT personnel at a thriving e-commerce company. All the documentation I have is of my own creation, as I've spent most of my time reverse-engineering the systems in place just so I can understand how everything works together. Since I've started, I've done everything from network and phone upgrades to database maintenance with Perl, and thus far it's been immensely rewarding. But as I dig deeper, I notice the alarming number of band-aids applied by my predecessor, and it seems like the entire company's infrastructure is just a few problems away from a total meltdown. The big question now is, how do I, as a single person, effectively audit the network, servers, databases, backups, and formulate a long-term plan that can be implemented by one person? Is it possible? Where do I begin?"

70 of 424 comments (clear)

  1. Explaines a lot by p43751 · · Score: 5, Funny

    You work at RIM?

    1. Re:Explaines a lot by Anonymous Coward · · Score: 5, Funny

      So you are asking him if he got a RIM job?

    2. Re:Explaines a lot by nschubach · · Score: 4, Funny

      It doesn't seem to work anymore, but for a while they had "http://rim.jobs"...

      --
      Every time I start to have faith in humanity, I ruin it by driving to work between 7 and 8 am.
    3. Re:Explaines a lot by youn · · Score: 5, Funny

      http://steve.jobs/ does not seem to be operational either :).... I will probably get marked as troll by apple fanboys... still funny :p

      --
      Never antropomorphize computers, they do not like that :p
    4. Re:Explaines a lot by barc0001 · · Score: 4, Funny

      Of course not, he said "thriving".

    5. Re:Explaines a lot by Yakasha · · Score: 3, Funny

      http://steve.jobs/ does not seem to be operational either :).... I will probably get marked as troll by apple fanboys... still funny :p

      Nah, couldn't find the +1 Troll.

      They get such a bad rap... poor trolls.

  2. methodically and late into the night by sentimental.bryan · · Score: 5, Insightful

    say goodbye to your life for the next year. hope you're getting paid to mislay it....

    1. Re:methodically and late into the night by mattventura · · Score: 4, Insightful

      This. From what I've heard, it often involves weekends too.

    2. Re:methodically and late into the night by DrgnDancer · · Score: 5, Insightful

      My guess is he's not... I'm immediately concerned by: "position of programmer and sole IT personnel" and: "thriving e-commerce company" together in the same sentence. The fact that this does not appear to be a small mom and pop with two or three servers making up the "e-presence" adds fuel to the fire. I'm getting the image of a fairly large company that relies heavily on it's web and e-commerce presence. And has one guy to take care of that. What happens when he's on vacation or sick and a server dies? What happens when the website has an issue and then *anything* else goes wrong?

      There's no bullpen here, if anything, anything at all, breaks there's only one guy to fix it. Day or night. If two things break you're already triaging. Surely a "thriving" company can afford a backup to what is pretty clearly a business critical unit?

      --
      I don't need a million points of light, just two points of multi-mode fiber and a 10 Gig-E router.
    3. Re:methodically and late into the night by datavirtue · · Score: 3, Insightful

      Judging from the questions, it seems he needs professional help. Seriously.

      --
      I object to power without constructive purpose. --Spock
    4. Re:methodically and late into the night by tomhudson · · Score: 4, Insightful

      I'm immediately concerned by: "position of programmer and sole IT personnel" and: "thriving e-commerce company" together in the same sentence.

      First thing to do is find out why. Why was there only one IT person, and why did they quit? Look through the junk on all the systems - if they were half-way intelligent, you'll find an external email address in some source somewhere. Ping them and ask what really happened.

    5. Re:methodically and late into the night by Lumpy · · Score: 5, Insightful

      "if they were half-way intelligent, you'll find an external email address in some source somewhere. Ping them and ask what really happened."

      I have done this before.. the response was...

      "Find another job and run, run as fast as you can. Oh and trust no one."

      --
      Do not look at laser with remaining good eye.
    6. Re:methodically and late into the night by gmack · · Score: 5, Insightful

      That's assuming the predecessor wasn't the problem. I have learned over the years that there are far too many tech types to prefer to be the only one that does a particular task and will make any excuse to management to make sure things stay that way. When these lone wolf types happen to not be as competent as they pretend to be they tend to themselves into too deep a hole so they either get fired or quit in frustration but when you talk to them it will always be some other person's fault.

      I'm not saying management isn't at fault, they very well could be but don't assume that right off. The first step is to try and get a read on how good the predecessor was at their job otherwise he can get very misleading info.

    7. Re:methodically and late into the night by afidel · · Score: 3, Funny

      Dude, for three years I was the sole server, network, and SAN guy for an S&P 500 company. If you can't handle a small e-commerce company with a handful of servers (we grew for 60 to 160 servers in those 3 years) then get out of IT. It took most of those 3 years to eradicate the poor work of my predecessor (hey, let's buy an $800 RAID card and then do Windows software RAID AND compression) but I eventually got there. It was a lot of work but I managed to keep it to 50 hours a week for most of that time

      --
      There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
    8. Re:methodically and late into the night by lightknight · · Score: 3, Insightful

      As opposed to management and their insistence that anyone is replaceable (except themselves) for pennies on the dollar. Sometimes, the lone wolf thing is just a HR-speak for "not a team player," sometimes it's a precursor to replacing someone with someone else who costs less.

       

      --
      I am John Hurt.
    9. Re:methodically and late into the night by xdroop · · Score: 5, Insightful

      What happens when he's on vacation or sick and a server dies? What happens when the website has an issue and then *anything* else goes wrong?

      Oh, that's easy:

      • He gets called in from being on vacation or sick;
      • he gets to work uncompensated time to fix the problem;
      • if he fails to either respond to the call OR fails to fix the problem, he gets fired;
      • if he succeeds in fixing the problem, he gets threatened with termination should something else fail while he's "unavailable".

      In fact, I'd lay odds that's how the vacancy occurred.

      --
      you should read everything on the internet as if it had "but I'm probably talking out of my ass" appended to it.
    10. Re:methodically and late into the night by tnk1 · · Score: 3, Interesting

      I don't think there is any doubt that under certain circumstances, one person could do a lot of work in what is, at it's heart, a set of automated processes to begin with. The problem here is that having one person do anything is a horrible idea. Even a small company should do it's best to have two IT people, or at least two people who know how to run the IT department if the company relies on their IT resources for their business. People do get sick or get hit by buses, or even have heart attacks on the soccer field while playing with co-workers after work. More often, they simply find other jobs and leave you with two weeks notice and that's not enough time to get the best transition, especially for the guy who runs "everything".

      As an IT person, I appreciate your level of workmanship for keeping things together yourself, but your boss should have been fired for allowing you to run things yourself. It's not a matter of having the skillset to pull it off yourself, its a matter of continuing operations and work-life balance. Maintaining staffing levels is not your responsibility, but I hope that you didn't believe that it was a good idea for you to be by yourself either. Your company got lucky that you were competent and didn't leave prematurely, but you aren't supposed to run successful companies on luck.

    11. Re:methodically and late into the night by Anonymous Coward · · Score: 5, Insightful

      Small operations like his are common. I'd guess he is a reasonably capable person. Where his world and yours differ, is that he's a jack of all trades (master of none)... because that's what that kind of business requires.

      Yes, in a larger company, you'd hire an Exchange pro, an AD pro, a networking pro, a programmer or two and a couple techs that are slightly more generalized guys to manage backups, the server room and help desk. The unfortunate truth is that specialized individuals are rarely any good outside their specialty... which is unhelpful to a small business that can't afford a stable full of tech talent.

      I know, as I've been this guy. It's brutal work but can be pretty satisfying. Every day your work is different. But you're never an expert at one particular thing and you're never paid like someone who specialized early on.

    12. Re:methodically and late into the night by Anonymous Coward · · Score: 3, Insightful

      First thing to do is find out why. Why was there only one IT person, and why did they quit?

      If there was only one, I'm betting it was a real small shop run by somebody who thought they could pay a local computer geek $10/hr to run everything, and the guy left the second he found a job that paid what the work is worth.

    13. Re:methodically and late into the night by Ransak · · Score: 4, Insightful
      What if you got really sick?

      I call this the 'Hit By A Bus' scenario. If you're hit by a bus in the next five minutes can the business carry on without you? If the answer is no for any reason then the business has major problems.

      --
      "Powers. I have them."
    14. Re:methodically and late into the night by Anarke_Incarnate · · Score: 3, Informative

      The pride and arrogance in this trend of "I am god" here is sickening. I was part of a 2 man crew on a company of 40 people and it sucked. I got called on my way to vacation, on vacation and on the way back. I was called when ON the doctor's table. I left, and while I value the learning opportunities I was given, it is no way to run a company.

      If a company does not value IT and the efforts put into maintaining it, then they are bleeding you and deceiving themselves.

    15. Re:methodically and late into the night by Jane+Q.+Public · · Score: 4, Interesting

      Absolutely. I once found myself working on a web project that had been through 3 previous developers -- and it wasn't even that big of a project -- but of course I did not know that when I took the job. If I had known the history of the project, I probably would not have taken it.

      I ended up trying to reverse-engineer a huge mess, without really being given the time to do so. They kept me busy making stupid little changes to the graphics, when it really needed some serious underlying code work.

      Then, out of the blue, they sprung a deadline on me of like 4 days, AND they wanted to release on a holiday. I said "No way. I would need at least another week to get this working properly." I did not get the week. PLUS they kept making changes up until literally the last hour, PLUS guess who got blamed when things -- inevitably -- did not work right?

      I was glad to get the hell out of there. As -- it turned out later -- were the 3 developers before me.

    16. Re:methodically and late into the night by racermd · · Score: 4, Informative

      I'm likely commenting too deeply for the person that asked the original question, but my advice seems to fit best here. What the company needs is an IT manager, whether hired directly or outsourced.

      Firstly, assess the corporate attitude towards hiring (competent) staff directly and buying or leasing hardware directly vs. purchasing outsourced services. Once you know where that conversation leads, you'll have a better idea of how to address the larger problems that only a bunch of time (and usually money) can solve.

      If the former, start the interviewing process ASAP. What you're looking for is self-starters that really do know their stuff. Take a handful of real-world scenarios, change some of the minor details a bit, and ask candidates what they'd do in that situation (or if they've encountered something like it before). Don't take them at their word, ask them to back it up with details of their own. Also, since you're going to wind up spending money on staff, you're probably going to be spending money on tools like new systems, software, and basic architecture hardware. Use an appropriate procurement process (and make sure it's followed) to meet your specific needs.

      If the latter, like I and many others here suspect it is, be sure to negotiate favorable contract terms with this in mind - everything is about money. You might be able to get a better rate on some services if you limit support to 8x5 instead of going 24x7, for instance. Is remote support acceptable or do you want someone on-site when you have to make that call? What is the response time to various levels of service calls? Do you want to host hardware on-site or have that done elsewhere? Things like that should be priced out and assessed against the needs of the business.

      Lastly, an important bit regardless of how the company wants to do it, the goal is to streamline operations which includes any support that's required when systems are not operating properly. Identify the weak subsystems and put them on a roadmap to be replaced with something more robust. It's a boring exercise in IT management that involves budgets and change control procedures but it does pay off in the long-run. If you need to get approval for spending, it helps to show what the current cost is, what the cost could be if things go wrong, and what costs could be if replaced with the more robust system. As long as you speak to your management in terms of money, they should listen.

      --
      My sources are unreliable, but their information is fascinating. -- Ashleigh Brilliant
    17. Re:methodically and late into the night by DigiShaman · · Score: 3, Interesting

      As a consultant, I found myself on the other end of this. Often, someone leaves and is replaced with a jack-of-all-trades IT admin / developer. That will almost always be the #1 uno mistake any company can make. I was called out to assist with a migration contracted by the hour. Not only did we run into several road blocks preventing a smooth e-mail migration, but I've also identified lots of red flags to the network. Some involved double-natting using old consumer routers as switches to Dell Power Edge servers out of warrant and flashing amber LEDs indicating faulty hardware someplace. But the worse security issue??? All AD accounts were made members of the local Domain Admins group just so users could install software on their local Windows XP machine. First of all, most employees shouldn't be local admins of their machine. But if they need it, you can make their domain account members of the Local Administrative group instead.

      There was other odds and ends to be found. A few weeks later, the guy quit his IT job. In my haste to provide as much value in consultation per hour as I could (saving his company money), I must have overwhelmed him instead and sent him into panic and shock. Damn...

      --
      Life is not for the lazy.
  3. 1 suggestions by Anonymous Coward · · Score: 5, Funny

    start drinking

    1. Re:1 suggestions by Anonymous Coward · · Score: 5, Informative

      Heavily

  4. Configuration management by Neil+Watson · · Score: 4, Informative

    Automate your servers so you can focus your time elsewhere. I use Cfengine.
    http://watson-wilson.ca/2011/03/enterprise-system-administration-using-configuration-management.html

    1. Re:Configuration management by 1s44c · · Score: 4, Informative

      Yes, automate everything, monitor everything, backup everything, document everything.

      I used to use cfengine but find puppet an easier tool to work with. Nagios and BackupPC are also wonderful tools but you might want to choose alternatives if they better fit your needs.

      You might want to express some concerns to management just in case something critical does fall over you don't look quite so bad.

    2. Re:Configuration management by vajrabum · · Score: 4, Informative

      Lots of folks here have talked about backups but if you're company is really successful then restores could be more of a problem than backups. Large databases and system configuration can take a loooong time. Develop a plan for restore and execute it regularly as a test. Make sure management understands the time for restoration. Two other things--virtualize (that reduces the coefficient of friction for moving things considerably) and consider using Amazon or some other cloud provider in your restore plan to in case your cage/server room/whatever burns. Some of those services are low or no cost until you start loading things up. If you go the cloud route be sure to get a read on your traffic, storage and other billable numbers. If that's the disaster plan then if the numbers are of any size at all you need to run the cost by the CFO to make sure that it's sustainable.

    3. Re:Configuration management by bill_mcgonigle · · Score: 3, Informative

      They all have root access still :-( A political fight I'm not yet prepared to have. I was able to take it away on the web servers, at least, and that's the only thing our developers touch, so life is a bit better.

      A fine baby step is to move everybody over to sudo. If you can get buy-in that everybody will track changes with git, then you have somebody to blame and can build a case if they break it. With sudo you have a record of who was mucking (in your /var/log/secure).

      If they're perfectly reasonable/responsible and you can track changes, it's not such a problem, really, unless you're worried that they're secret agents meaning to break your stuff. I typically only see frustrating carelessness where people can get away with it.

      --
      My God, it's Full of Source!
      OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
    4. Re:Configuration management by TomTraynor · · Score: 3, Insightful

      Do the fight, at least if there is a paper trail your ass is covered. If your company has auditors, buy them a coffee and see if they can help you explain to senior management why root access for everyone is a bad thing. I needed it years ago as the support person, but, when I moved back to development they kept my access. They gave me very strange looks when I asked for access to be revoked, but, when they got audited they didn't get nailed by the auditor for having developers full access to the prod machines.

      As a compromise see if you can get a 'SYSTEST' area defined where an image of prod data is stored and the new code that is to be promoted can be staged. That way developers can put up their code and prove it works with prod data and if it gets signed off by management you can 'promote' the code to the prod servers.

      --
      Panic now, beat the rush!
  5. "A little over a month ago I assumed the position" by Anonymous Coward · · Score: 5, Funny

    Dude, that is to easy. There are serious wiseacres on this board.

  6. Escalate by sociocapitalist · · Score: 5, Insightful

    Brief your management on the situation. Explain what condition things are in and what is needed to get them into a manageable state. Give them a list of projects / tasks that you have to deal with and get them to prioritize.

    --
    blindly antisocialist = antisocial
    1. Re:Escalate by gknoy · · Score: 4, Insightful

      I'd like to add that v1 above made a great comment too, namely that in addition to the crucial steps you mentioned, it's also important to keep management informed of your task list AND your progress on it, since often your fixes will be behind-the-scenes things that they may not notice. Make sure they know you're working hard to make their system better and more robust, do NOT assume that anyone else notices what you're doing.

    2. Re:Escalate by Decameron81 · · Score: 4, Insightful

      Brief your management on the situation. Explain what condition things are in and what is needed to get them into a manageable state. Give them a list of projects / tasks that you have to deal with and get them to prioritize.

      Its sort of unrelated. But my brother was doing some independent audio work for some VIP wedding in Italy, when he realized the electrical hardware & connections were a mess (meaning they were actually dangerous to use). He first talked with the management for the event and let them know about the situation. They ignored him. He quit the job, and was highly criticized for it.

      As he was disconnecting all of his hardware with his team, a short circuit caused a fire, which fortunately was controlled easily.

      The event's management immediately contacted him to offer him a formal apology and pay for the damages to his hardware. They also offered to hire him back, double the salary. The last part was kind of luck, but had the fire not been controlled as easily as it was, my brother would have shared the responsibility.

      Long story short: sometimes you have to know when to step down.

      --
      diegoT
  7. Get management buy in... by Lumpy · · Score: 5, Insightful

    You need to document it and get management to approve spending money.

    I'll bet you $100.00 the band-aids are there because management refuses to spend money on Infrastructure and its' why it is a mess and the guy there beforehand has left.

    99% of the time a hosed IT infrastructure is because management refused to spend any money so it had to be half assed.

    --
    Do not look at laser with remaining good eye.
    1. Re:Get management buy in... by kiwimate · · Score: 4, Informative

      Exactly correct.

      Step 1.

      Document. Look at your critical systems. Document what they are. Start at a high level - line of business, internal (HR, etc). Drill down - I have an Oracle server, I have a Citrix system to allow the users to remote connect, which uses a VPN, etc.

      Cost: your time.

      Step 2.

      Prioritize. What are the most important systems? Start with the systems which, if they go down, will cause the company to lose money. Then the ones which support internal processes. Rank order.

      Cost: your time. Possibly management's time - they may have input into priorities.

      Step 3.

      Audit. Start at the top and find out just what state they're in. If you don't feel sufficiently comfortable with a particular technology to do this yourself, hire an SME for a few hours.

      Cost: potentially the consulting SME to evaluate various systems. Note - the initial contract is an audit, not a "find everything and fix".

      Step 4.

      Fix. If you have audit notes which say "this critical line of business system is on the verge of death and once it dies it can't be resurrected", that goes first. If you have audit notes which say "this is a system which provides some reporting capabilities and it's a bit shaky, but worst case is you have to reboot the server and the reports to management go out a bit late", not so bad.

      If you get to step 3 and management won't pay, then you have a problem.

      If you get to step 2 and management won't give up their time, then you have a very big problem.

      A big question will be the level of support from management. If they are not supportive, or if money is tight and they say "we'd like to pay for the consultants but", then that's why you've rank ordered.

      If they're cooperative but don't have the money, work with them to figure out some kind of timeline based on highest risk.

      If they're stubborn, urgh, bad spot. Do your best to determine level of risk. Work with the company accountant to figure out the cost to the company if a critical line of business system goes down for 10 minutes. 2 hours. Include some waffle about reputation, if you can. Include any penalties or SLA violations, if you have those.

  8. this is a majorly funny story by roman_mir · · Score: 5, Insightful

    Facts:

    1. The job has lasted for 1 month so far.
    2. The e-commerce company is 'thriving' apparently'.
    3. All of the systems have been "reverse engineered" in that 1 month.
    4. All of the documents are written in that 1 month.
    5. In 1 months there have been: network and phone upgrades and database maintenance with Perl and it all has been 'immensely rewarding'.
    6. The entire infrastructure is 'a few problems away from a total meltdown'.
    7. Single person IT operation to do everything.

    Question: is this for real? What's the size of the company and what's the budget?

    1. Re:this is a majorly funny story by 1u3hr · · Score: 5, Insightful

      Question: is this for real?

      It's an "Ask Slashdot". They're as real as "Letters to Penthouse". Both carefully crafted to create a fantasy situation to excite readers. Read them if the subject is something you're interested in, but don't waste your time giving advice..

    2. Re:this is a majorly funny story by KermodeBear · · Score: 4, Insightful

      I don't see where he said that all systems have been reverse engineered and documented in one month; only that he is currently reverse engineering systems and documenting.

      And, maybe this guy likes what he is doing, getting his hands dirty with network and phone stuff. And some people really like writing Perl (I don't; I think it's the devil's language). If he finds his work rewarding, who are you to mock him?

      --
      Love sees no species.
  9. Re:Quit by Anonymous Coward · · Score: 5, Insightful

    No!

    This is actually the kind of career building stuff one should leap at. What would you rather say in an interview for your next job:
    - I took this system that was falling apart and made it run like clockwork.. downtime and issue frequency went from "it's down again" to "been up all year" ..
    - Yeah it was pretty good when I got there, and I maintained the status quo

    My thoughts on original question:

    First step is comprehension. You can’t fix what you don’t know you have/need. Identify the key components of your system. Then for each key component, break it down to it’s parts and dependencies. Then break each one of those out, and so on, until you have a pretty damn good idea of what you have.

    Next part is assessment. For each component you’ve identified, what is its current state.

    And then it’s time to do triage. Prioritize stuff by largest potential impact.

    And finally carry out your well thought out pla.. ok, can't say that one with a straight face. Basically try to fix stuff when you can, between putting out the daily fires.

  10. It's the Eye of the Tiger! by anom · · Score: 5, Funny

    Just buy a few cases of your energy drink of choice and put Eye of the Tiger on repeat until you've got it all fixed.

    I believe in you.

  11. Wait a minute .... by CuriousGeorge113 · · Score: 4, Insightful

    "I assumed the position of programmer and sole IT personnel at a thriving e-commerce company."

    Wait.... a thriving e-commerce company has one IT person? Am I missing something here...? No wonder everything was band-aided together. They have one person doing everything.

    You may want to consider hiring an outside firm to come in and do the audit for you. The last thing you need right now, on top of your daily workload, is to perform an audit. That, and a third party firm creates a sense of objectivity, and would eliminate the "The IT guy wants a new toy" response from the CFO.

    --
    No man is an island, But if you take a bunch of dead guys and tie them together, they make a pretty good raft.
  12. Re:I wouldn't worry... by mikael_j · · Score: 3, Informative

    Sadly there's a lot of truth to this. In my experience the difference between most "good" and "bad" networks is whether the WTFs are vendor-blessed hacks or in-house hacks.

    Of course, there are always those places where this is not the case but I've seen enough IT environments to believe that for a majority of companies this is sadly the state of things. If maintenance in the average factory was handled the same way IT is handled at the average company most machines would consist of approximately 30-50% duct tape, newspaper, string and glue...

    --
    Greylisting is to SMTP as NAT is to IPv4
  13. Re:you don't by Synerg1y · · Score: 3, Insightful

    Or bring in contractors / consultants and have them serve their part and then part ways, the biggest mistake you can make is taking everything on your shoulders, that = loss of life & health. It's a job and work != life.

  14. Re:Getting a Grip by Hadlock · · Score: 3, Insightful

    This is the only solid advice I've read so far. Band-aid solutions are indicative of two things: too shy to ask management for a bigger budget, or management's reluctance to improve their budget. Generally it is the latter.

    --
    moox. for a new generation.
  15. Re:Quit by 1s44c · · Score: 3, Insightful

    Quit? Do you give up on every task before you start?

    Some of us like a challenge.

  16. What, where, why... by ScottyLad · · Score: 5, Informative

    I've spent the best part of my career undertaking tasks like this (as an external consultant), with my average time on an assignment lasting somewhere between 18 months and 3 years.

    My aim on every project is to make myself obsolete - in that I try to get documentation up to a point where a suitably qualified individual could come in, read the documentation, and work the rest out for themselves.

    My primary objectives are to implement some form of inventory control to document the what / where / why...

    • What - What have you got (servers, software, services, contracts, operating systems, databases, users)
    • Where - Where is it - where are your servers, what machine is this software licence running on?
    • Why - What is the Business Justification for this service - what is the Business Impact if this database stopped running tomorrow?

    Once you've got to that stage, then you're ready to get in to the real technical details. Remember that you are pitching your documentation to your successor, or to some imaginary "suitably qualified individual", so documenting what a system does and why is a higher priority than commenting every line of code.

    It is possible to do with one person, depending on the size of the organisation, it can be particularly rewarding to do on your own - in a small business you often find some of the users have a good understanding of some of the systems, or are keen to learn.

    You stated in your post that you've assumed the role of programmer and sole IT personnel - which means you need to learn to think like a manager as well as a techie (which is harder than most people imagine!). Once you learn to focus on the business priorities, you'll understand where to begin with the technical detail, and what level of documentation is required.

    --
    Philosopher (n) - a wise person who is calm and rational; someone who lives a life of reason with equanimity
  17. Start over... slowly by rabenja · · Score: 5, Insightful
    I was in much the same position 12 years ago at this company. I am now CIO with 7 people on my team with several business partners to help manage the infrastructure. My advice for what it is worth:
    • - Take time every day to assess and analyze the bigger picture before allowing yourself to get drawn into the details.
    • - Look at the entire system from a risk mitigation perspective. What areas are most likely to cause "meltdown". Spend the most effort there.
    • - What are incremental changes that can be made that improve the overall risk picture? Focus on the biggest bang for the buck.
    • - Defer anything that works well enough for the time being.
    • - Avoid big bang solutions unless they can be contained and tested well, with the capability of rolling back.
    • - Get help where necessary.
  18. Re:Quit by Anonymous Coward · · Score: 5, Insightful

    I worked in this environment for one year as to not tarnish my resume. I toughed out the last 4 months absolutely burned out and bitter. You cannot communicate to management that outages and issues aren't your fault; they're adopted. When you fix things, you'll inevidably miss something (I did because of the pace, not dictated by me). Get out. It's not worth the challenge to get proper budgeting to get the right tools in place or the organization as a whole wouldn't let things get how they are in the first place. The business model I came from is failing. If you're good, there are better paying, better rewarding, less "heart and soul" companies out there. You're doing basically startup work for at will employment pay.

  19. Re:Quit by The+Moof · · Score: 5, Insightful

    I'd amend that to a big "maybe" for sticking around.

    All of what you said (and the initial reaction to quit in the GP) all hinges on the root cause of the mess. If it's a result of the predecessor not doing things correctly and flying by the seat of his pants, you're correct at jumping at the opportunity. However, if it's caused by management screwing IT every chance they get with poor timelines, lack of funding, no foresight, and so on, run like hell.

  20. Re:Getting a Grip by Atanamis · · Score: 5, Informative

    agreed. As soon as I saw this was an IT department of one, I could tell the exact amount of care that management has on getting things like this corrected. These things are in place because management does not want to provide what is needed. If they only want to pay for band-aids, that is all they will have.

    This isn't necessarily the case though. I have a friend who took over IT at a small business. When he walked in they were using pirated software and their IT was a complete mess. After he put in hours to get it fixed up (with personal support from the owner), they ended up offering him an executive position with a massive pay increase. Some small shops with one IT guy really just don't know what they are doing, and haven't had a person in the job to tell them what is being done wrong. Your advice is still good though. A person in that situation needs to test whether they have management support to do things better. If so, it can turn into a career making opportunity to turn things around. If you can't get the management on your side though, it very well could be time to start looking for another job with more supportive management.

    --
    Atanamis
  21. Me too by weave · · Score: 4, Interesting

    I walked into a similar nightmare two years ago. Before I even took the job I assessed the situation and gave them a proposal for what needed to be done and a price estimate for the software and hardware. I told them I would not take the job unless they committed funds to support the function. I also warned them that there were numerous ticking time bombs and I'll defuse them as fast as possible but there was no magic fix and it would take some time and they could have a disaster still

    I then convinced them to only hire me part-time and to also hire a part-time desktop support person for a few reasons including they don't want to pay me to do that and having two IT people at least gives you some continuity. Even if the desktop support guy doesn't know the high-end stuff, if I leave the desktop person can still guide the new person and save them a lot of time I never got.

    My line of attack was:

    1. Back up data. Wasn't easy. They had old cart tape drive units that were problematic. I ended up getting cheap TB externals to at least make mirrored copies of things. But at least if there was a disaster, I'd have their data safe somewhere -- even if it took me weeks to reconstruct systems to use it.
    2. Secure data. Everything was wide open. All domain users WERE DOMAIN ADMINISTRATORS. Locking that down was a pain. An understanding of what would be impacted ahead of time would have taken months, so I didn't tell anyone what access they had, then started removing people from domain admins a few at a time and waited to hear what broke, then fixed access issues. Not user friendly, but getting that under control fast was necessary.
    3. Renovated room with servers in it (that were 5+ year old deskside servers) so as to accommodate a rack with proper A/C flow, electrical feed, and physical security.
    4. Had them throw ~$50k into a virtual infrastructure and SAN, then virtualized all their old deskside servers until I could migrate apps on them to fresh OS installs. Used Vspehere's DRS product to back up the OS images and data to another system I had them buy for their other site (thankfully not too far away and connected by fiber)
    5. Identifying all in-house written programs and finding turnkey solutions to them, preferably cloud-based to reduce their dependency on in-house IT staff in future.
    6. Documenting everything as best I can as I go.

    Getting back to original point, a one-person IT shop is suicide. Them having a two person part-time crew is better because if one leaves, at least the other can provide some sort of continuity -- and that happened already. The fairly young guy I hired for desktop support two years ago died last month :-(

  22. Re:Quit by v1 · · Score: 5, Informative

    Yep. Hop into the waders and get to work. It can be a very rewarding experience turning a steaming pile back into a smooth running good looking machine.

    To add to the above, document everything. Though it sounds like you're already doing that. Make sure it's documentation that works for anyone not just you. Don't take anything for granted. Automate whatever you can, including problem detection and notification. (save yourself from having to check things daily or weekly, have it shoot you an email or something if a common issue crops up again)

    Make sure your employer fully understands the situation you and they are in, so they don't expect you to be doing improvements and striking things off their sore to-do-list that they were probably hoping you'd tackle the day you started. Get them a timeline as soon as you get something of a grip on the situation, tell them where you're going to be spending your time to start with, and the reasons why it's essential and going to delay their getting their bells and whistles and visible bang-for-the-buck of hiring you. Otherwise they may think you're just sitting on your butt because they're seeing no tangible benefits.

    If you've got a LOT of things that need to be fixed, things that can be done by closer to trained-monkey level, consider getting a temp assistant to help you dig out. Someone to run around and reimage machines, fix networks, repair stations, do RMAs, etc while you pull up your sleeves and unhack the servers. But if they're not in that big of a hurry this may not be appropriate.

    Good luck with it, sounds like fun actually, a challenge at the least.

    --
    I work for the Department of Redundancy Department.
  23. Re:Quit by Archangel+Michael · · Score: 5, Insightful

    It is probably a combination of the two. Because MGMT always assumes IT can do something with very little, and often the Impossible with Nothing.

    We are skilled (most of us anyway) problem solvers, and they rely upon that to function. I hate to say it, but to the original question should be answered this way: HIRE outside consultants to evaluate your system(s), and give you a hard copy report on their findings that you can present to MGMT.

    If the situation is as I believe, it is worse than he even suspects. He needs more help than he can do by himself, to get ahead of the curve.

    --
    Agent K: A *person* is smart. People are dumb, stupid, panicky animals, and you know it.
  24. Re:Quit by KermodeBear · · Score: 3, Interesting

    Best advice, right there. It's a challenge for certain, but making things better is the best thing you can do - for the company (ha) and, far more importantly, for yourself.

    Hang in there!

    And although it may feel like the whole place is going to fall apart any moment, it hasn't yet, you're in charge, and it sounds like you're gradually making it all better. Take a deep breath, Don't Panic, it'll be okay.

    --
    Love sees no species.
  25. Re:Quit by Nethemas+the+Great · · Score: 3, Insightful

    I'm familiar with that kind of mess... Best advise is to piss on all the fires as quickly as they arise and unfortunately put in over-time putting in place alternative implementations that prevent it from happening again. During those brief intervals when something isn't burning survey the land and determine the highest risk (read cost and likelihood of failure) and take proactive steps to mitigate. At first you'll feel like you're in the worst kind of hell but eventually, things will start to come in place and you'll be able to enjoy your hard earned vacation. Further, it's quite probable that you don't have a complete knowledge of the technology being used. Get it. There's no substitute for actually knowing what you're doing. It's unfortunately far too common for people with no time and/or interest to search the web for a snippet of code, or set of procedure steps and hack these into place with out the slightest clue what they're doing nor the consequences thereof. It's usually better to go sharpen the ax before you go into the woods even if it seems like it will take more time. Trust me, it will pay dividends later.

    --
    Two of my imaginary friends reproduced once ... with negative results.
  26. In this situation currently. by YojimboJango · · Score: 5, Interesting

    The number one best thing you can ever do in your situation is ask your bosses what they think the system should be doing.

    Step 1: All the squirrelly business logic and the rationale behind each system you have to maintain should have a plain text description. You have to know the 'Why' before the mess of band aids that is the 'How' will ever make sense. Have your boss (or his secretary, or whoever) document it and get it to you. Do NOT do this step yourself. Repeat do NOT perform this step.

    Step 2: Put out fires till someone not you finishes step 1. Start making backups of every last scrap of data you can get your grubby hands on.

    Step 3: Once step 1 is done compare it to the mess. Note where the realities that are in your bosses head diverge from what is actually happening. Your job is to now create a detailed functional spec that takes what your boss says, and expand on it with what is really happening. Try to include worst case scenarios and document them as intended features.

    Step 4: Have your boss and sales and marketing, and every other top level manager sign off on it. This will not happen. No two managers in your company will fully agree on what the current system is actually doing. Your goal is to figure out what sales and marketing are telling your users that your products do. Do not disregard this step or it will come back and bite you very hard.

    Step 5: Once every department actually agrees on what your job really is, you will be well equipped to start the long process of fixing things. Again make lots and lots of backups. Management will sign off on step 4, then you'll fix a gaping security hole, and some customer somewhere will throw a raging fit because sales promised that they'd be able to get admin access to your databases or something ridiculous.

    Step 6: Don't be an ass. When step 5 inevitably happens, explain the miss-step in communication graciously, and roll back. If you pulled not being an ass off properly, you now have a great platform to explain to management why X was a bad idea, and present an idea to fix it.

    I'm a grizzled vet to your situation. If someone would've told me what I just told you when I started out, there would have been a lot less headache and stress. Hang in there, it can be an intensely rewarding experience.

  27. Re:Quit by Anonymous Coward · · Score: 3, Interesting

    So you worked for one whole year in this industry, and that gives you insight enough to know that there is only ONE reason that things get to this state?

    That's interesting because I've been in this industry for 22 years and I can list at least two possible reasons. The obvious one you're missing is that there is budget but the previous guy was an idiot.

    I'd say there's a 90% chance it's the latter. Budget is easy to come by in a thriving business, but people who know what they are doing are still rare (hell, this Ask Slashdot is basically "help I don't have a clue what I'm doing").

  28. Your biggest problem will be funding by rickb928 · · Score: 3, Insightful

    Either because your predecessor 'made it work' with little or no funding (better translated as 'he made it almost work'), or because your predecessor failed to acquire sufficient funding to do it 'right'.

    As a former field tech/consultant, try to avoid bringing in consultants to explain why the stuff needs to be bought. Many a manager ends up believing the consultants and disbelieving their staff. You get to either hire the consultant to justify your plan or find yourself undercut by that lack of confidence.

    And of course nail some problems and show improvements as early as you can. It's wise to both solve pressing impactful problems first, and gain trust.

    I always loved going into a client with lots of problems. Not just for the thrill of making things right, but knowing that if I did it right I had a referral for my next client - because the end result was most often working my way out of the gig. Either I passed the client on to another tech to maintain, or they got their staff's legs under them and could carry on. So long as there are more clients, this is good. Great fun to figure things out, isn;t it?

    --
    deleting the extra space after periods so i can stay relevant, yeah.
  29. Re:Quit by Ephemeriis · · Score: 4, Insightful

    This.

    I'd be willing to bet a year's pay that the previous guy wasn't straight-up incompetent. He was probably relatively skilled, and doing the best he could with the resources at his disposal. Which were probably not actually the resources he needed.

    Odds are good that there's a reason why the place is in the condition it is now.

    Odds are good that there's a reason why the last guy isn't there anymore.

    Odds are good that you're going to need more than one guy in IT to get it all straightened-out.

    --
    "Work is the curse of the drinking classes." -Oscar Wilde
  30. Why would they need a backup? by Colin+Smith · · Score: 4, Funny

    They have a guy who finds upgrading phone systems immensely satisfying! If he's sick he'll come in and fix it and who needs vacation anyway, he'll take the cash instead.

    I'm betting it's a psychotic break and he IS his predecessor.
     

    --
    Deleted
  31. Re:Quit by Lumpy · · Score: 3, Insightful

    When I'm given a spoon and told to storm the hill and kill everyone in the machine gun nest?

    yes. I quit before I even try. After you have been in IT long enough you can spot a suicide mission a mile away.

    --
    Do not look at laser with remaining good eye.
  32. Three Letters by Overzeetop · · Score: 3, Funny

    When I was hired to run the IT department of a major company my predecessor left three letters in the desk that was now mine. Each letter was clearly labeled; System Failure #1, System Failure #2, System Failure #3. A post-it note was attached to the bundle of letters.

            In case of a substantial system failure open the letters in order, once per failure, and they will help you through the problem.

    I put the letters back in the desk and forgot about them.

    About one year later we had a cascading server failure that left our corporate intranet and several important production servers off-line. While repairing the problem I remembered the letters. Curious, I opened the first letter.

            Blame me, your predecessor

    The day after we got the servers back up I was called in to my boss;s office to explain what happened and why were down for so long. Taking my cue from the letter I blamed my predecessor. My boss was satisfied with my answer and let me go.

    About six months down the road we had another big failure. This time our primary database server went down and the secondary was having trouble dealing with the load. I had to put a lot of extra hours into getting them back up and we lost a few transactions due to the backup server not being able to function under the load.

    Once again, I reached into that desk drawer and opened letter #2.

            Blame the equipment

    This time I lamented to the boss about how it wasn't my fault. It was that backup server! If we had some good equipment to run on these things just would not happen. He was satisfied with my answer and I went back to work.

    Things ran smoothly for the next 18 months. Then we got hit with a virus that somehow got past our firewall and wreaked havoc on our systems.

    I opened the third letter.

            Write three letters

    (Sorry, this was the first thing I thought of when I read the summary)

    --
    Is it just my observation, or are there way too many stupid people in the world?
  33. Observium, ESXi, and Hobbit by charnov · · Score: 4, Informative

    The combo of Observium (network monitoring), Hobbit (monitor everything with extreme ease), and either ESXi or Proxmox VE for consolidation and ease of management/isolation/testing/etc has served me well for years to take control of large organizations quickly. Last two business I was hired to fix, I set this up and then built a parallel enterprise as VMs (the right way this time) and then cut everyone over in a weekend. No one noticed the change except to say stuff didn;t crash anymore and it was really fast.

    Also OpenFiler and NexentaStor make for a great SAN.

    If you need more: PFSense for firewall or VLAN router, BlueIris for IP cameras, PBX in a Flash for VoIP, SoGo for Outlook compatible email, LibreOffice, etc.

    --
    [RIAA] says its concern is artists. That's true, in just the sense that a cattle rancher is concerned about its cattle.
  34. Re:Quit by 19thNervousBreakdown · · Score: 4, Insightful

    Oh god yes do this.

    If your bosses will sign off on getting a second opinion, great, stick around and fix stuff. If they don't even want to know that it's screwed up, get out as soon as you can.

    Just be very careful when selecting who you'll bring in to do the audit, and be very clear that if anyone is brought on to help fix the problems, it absolutely will not be the same as the evaluators. Otherwise you're essentially handing them a blank check to say whatever they feel like is wrong, and fix it any way they want.

    My suggestion is to generally avoid letting contractors do more than consult with you on a project--they know very well how to set things up so that it's easy for them to work on in the future, and are generally not very good at making the stuff actually fit in well with your business processes.

    --
    <xml><I><am><so><damn>Web 2.0</damn></so></am></I></xml>
  35. Re:Quit by malkavian · · Score: 4, Informative

    I've been in a few similar situations over the years. The first thing you put on the table is "This is not an acceptable situation. Your risks are .".
    If they don't cover this, then that's really not your problem. I've coding for 32 years, and doing sysadmin stuff as well for about 20 (among other strings to the bow), and live in despair of people who really don't understand that this stuff doesn't happen by waving a magic wand, and there is more to it than making pretty buttons appear on a screen.
    At interview, if someone said they'd reverse engineered and documented a system in this environment (and yes, I interview people for dev/admin jobs from time to time), I would seriously ask them why they didn't get management a junior to cover the paperwork and cover duties, while they dealt with the heavy lifting of reverse engineering and planning. I want someone around who will grok the risks, take responsibility and come up with a resilient service (not just a few machines that may be able to fail over). Budget isn't always easy to come by, especially if there are political axes to grind.
    I'm with the AC on this, from the limited info available. Either get them to get you a second, or get out. If the business is thriving, they can afford it, and they're just being cheapskates (and in many years, I've met quite a few like that) if they don't. You don't want to work for a cheapskate.

    The time to take this kind of work on solo is if you're part of a startup, when you've got a lot invested in the success of the company. You live or fall on your wits, capability, and ability to lose every evening, weekend, and many a night too, on keeping this up and running as cheaply as possible.
    Once the 'thriving' level arrives, you'd better make sure you're not still carrying that load alone, otherwise your own lifespan (as well as that of the company) may be quite severely limited.

  36. Re:Quit by myurr · · Score: 4, Insightful

    Completely agree. Perhaps the previous guy didn't take the time to inform the management of what was required to do the job properly, or didn't know himself, or was more interested in painting himself as indispensable than doing the right thing. First things first, if this is genuinely a thriving e-commerce company then their website is their number one priority and their fulfilment systems are the number two priority, phones are number 3 with everything else taking a back seat - and they REALLY need to get a second employee. If you are ill, on holiday, or, deity forbid, something happens to you, then they need someone else who can step in. If their infrastructure is as shot as you suspect then you're going to need a second brain to sort it all out and help you implement it.

    You must make sure that backups are being taken and are robust. You need a disaster recovery plan. You need both short term and long term plans to scale the infrastructure as the business grows and reactively if there's a sudden growth spurt. You need to know where the next bottleneck in the system is and come up with a plan to fix it. Do you have an adequate handle on monitoring traffic to the site from when they first land through to placing an order? Do management have the stats required to make informed decisions about the business? Management will also need to be aware of when IT will need extra funds as mapped against their own sales growth targets.

    Once all of the above is sorted, and decent management allowing (and presuming this isn't something that is already being taken care of), you need to start suggesting to management the skillsets of people and / or contractors and / or agencies that need to be brought in to proactively grow the business. Be it SEO, PPC, UX, new features, etc. whatever it is, you have the opportunity to help the business understand it all and be instrumental in their success.

  37. if they don't care, why should you? by plurgid · · Score: 4, Funny

    You wouldn't be in this situation if your employer gave a crap. It's plain and simple: you report to someone. They know the extent of the problem and that there is only one of you. If they cared, there would be more than one of you. But there isn't. So turnabout is fair play.

    This is the true American solution to your problem: find other people to exploit and skim off the top ...

    Step 1: tell them you're going to become a telecommuter so that you can work 100% of the time
    Step 2: get on elance or some other such site: hire gobs of cheap (dubious) overseas help at $1/hour
    Step 3: instruct them all to send emails from your address and answer the phone with your name.
    Step 4: find a different job and just let your sub-contractors handle that one until the house of cards falls apart

    If your current employer calls you out on the fact that you have 15 different accents and sometimes answer the phone in a female voice, ask them why they're so racist.

    bonus if you used a pseudonym when hiring for your present job.

  38. This gig has writings on the wall all over by luis_a_espinal · · Score: 4, Interesting

    My backup was my boss who was technically competent, so there was that, but it's not like I've never worked a job as a one man show. You buckle down and do what you have to and make it so things don't break just because you're not around (yes, this requires budget, but I've been fortunate enough that anyone willing to pay me what I'm worth is also willing to invest in a solid infrastructure).

    This made you (and the situation you described) an outlier, one with a positive outcome. Your experiences cannot be applied in general. In general, this is rarely the case, for one-man-tech shops that is.

    For the most part, conditions as described by the original submitter typically have "GTFO ASAP!" written all over it. I've done IT in companies, small and large, and I can attest that what you say is true: Yes, it is possible to being the one-guy-IT-slash-programmer-shop at a small e-commerce company. But the question is why? I wouldn't do it (again) unless a good compensation package came with it (which is typically never the case), or if I'm fresh out of school with nothing on my plate to take (in which case, it is ok.)

    Good companies are never based on one-man-IT-slash-dev-shops, regardless of size (or at least they try not to.) I know, again, I've worked with companies big and small. Conditions like that are typically good proxies for more systemic problems, and at the end of the day (whenever possible), you want a paycheck, a rewarding job and good working conditions. Rarely you see that with one-man-IT-slash-dev-shop gigs, rarely if ever, regardless of the size of the company.

    That's just my $0.02 input from what I've seen. YMMV so readers be warned and please take this anecdotal piece with a grain of salt.