Ask Slashdot: Getting a Grip On an Inherited IT Mess?
First time accepted submitter bushx writes "A little over a month ago, I assumed the position of programmer and sole IT personnel at a thriving e-commerce company. All the documentation I have is of my own creation, as I've spent most of my time reverse-engineering the systems in place just so I can understand how everything works together. Since I've started, I've done everything from network and phone upgrades to database maintenance with Perl, and thus far it's been immensely rewarding. But as I dig deeper, I notice the alarming number of band-aids applied by my predecessor, and it seems like the entire company's infrastructure is just a few problems away from a total meltdown. The big question now is, how do I, as a single person, effectively audit the network, servers, databases, backups, and formulate a long-term plan that can be implemented by one person? Is it possible? Where do I begin?"
You work at RIM?
say goodbye to your life for the next year. hope you're getting paid to mislay it....
start drinking
Automate your servers so you can focus your time elsewhere. I use Cfengine.
http://watson-wilson.ca/2011/03/enterprise-system-administration-using-configuration-management.html
UNIX/Linux Consulting
Dude, that is to easy. There are serious wiseacres on this board.
Did the last guy outsource everything to india?
Assess your most vulnerable items. If that is a server, a network component, application, database etc. Give them all a critical score. Share that list with your boss/manager and work the list one item at a time. You can't spread yourself too thin when working a project like this, so focus on one or two items at a time until you see a light at the end of the tunnel.
Brief your management on the situation. Explain what condition things are in and what is needed to get them into a manageable state. Give them a list of projects / tasks that you have to deal with and get them to prioritize.
blindly antisocialist = antisocial
You need to document it and get management to approve spending money.
I'll bet you $100.00 the band-aids are there because management refuses to spend money on Infrastructure and its' why it is a mess and the guy there beforehand has left.
99% of the time a hosed IT infrastructure is because management refused to spend any money so it had to be half assed.
Do not look at laser with remaining good eye.
Facts:
1. The job has lasted for 1 month so far.
2. The e-commerce company is 'thriving' apparently'.
3. All of the systems have been "reverse engineered" in that 1 month.
4. All of the documents are written in that 1 month.
5. In 1 months there have been: network and phone upgrades and database maintenance with Perl and it all has been 'immensely rewarding'.
6. The entire infrastructure is 'a few problems away from a total meltdown'.
7. Single person IT operation to do everything.
Question: is this for real? What's the size of the company and what's the budget?
You can't handle the truth.
No!
This is actually the kind of career building stuff one should leap at. What would you rather say in an interview for your next job: ..
- I took this system that was falling apart and made it run like clockwork.. downtime and issue frequency went from "it's down again" to "been up all year"
- Yeah it was pretty good when I got there, and I maintained the status quo
My thoughts on original question:
First step is comprehension. You can’t fix what you don’t know you have/need. Identify the key components of your system. Then for each key component, break it down to it’s parts and dependencies. Then break each one of those out, and so on, until you have a pretty damn good idea of what you have.
Next part is assessment. For each component you’ve identified, what is its current state.
And then it’s time to do triage. Prioritize stuff by largest potential impact.
And finally carry out your well thought out pla.. ok, can't say that one with a straight face. Basically try to fix stuff when you can, between putting out the daily fires.
Just buy a few cases of your energy drink of choice and put Eye of the Tiger on repeat until you've got it all fixed.
I believe in you.
PRTG monitor.
Welcome.
Tell/emai/post your opinion and observation, as detailed as you can, alongside with your concerns. Make sure your managers see it. Do not expect them to do anything about it. Do it for your own reference, so you may continue working normally. Do not overwork or overworry yourself, for that will not bring you nor the failing systems anywhere closer to resolution. Do your normal job, stay cool and speak up. You are in drivers' seat.
"I assumed the position of programmer and sole IT personnel at a thriving e-commerce company."
Wait.... a thriving e-commerce company has one IT person? Am I missing something here...? No wonder everything was band-aided together. They have one person doing everything.
You may want to consider hiring an outside firm to come in and do the audit for you. The last thing you need right now, on top of your daily workload, is to perform an audit. That, and a third party firm creates a sense of objectivity, and would eliminate the "The IT guy wants a new toy" response from the CFO.
No man is an island, But if you take a bunch of dead guys and tie them together, they make a pretty good raft.
Always start by making sure the backups are working properly.
Website Just Down For Me? Find out
Sadly there's a lot of truth to this. In my experience the difference between most "good" and "bad" networks is whether the WTFs are vendor-blessed hacks or in-house hacks.
Of course, there are always those places where this is not the case but I've seen enough IT environments to believe that for a majority of companies this is sadly the state of things. If maintenance in the average factory was handled the same way IT is handled at the average company most machines would consist of approximately 30-50% duct tape, newspaper, string and glue...
Greylisting is to SMTP as NAT is to IPv4
Or bring in contractors / consultants and have them serve their part and then part ways, the biggest mistake you can make is taking everything on your shoulders, that = loss of life & health. It's a job and work != life.
This is the only solid advice I've read so far. Band-aid solutions are indicative of two things: too shy to ask management for a bigger budget, or management's reluctance to improve their budget. Generally it is the latter.
moox. for a new generation.
Quit? Do you give up on every task before you start?
Some of us like a challenge.
Short of blowing it up and starting fresh, this is the best way. Kidding aside, I was in the same situation as you several years ago.
We happened to have Sharepoint already installed (as part of SBS2008), so we started using its Wiki feature for our documentation.
We use its lists feature to keep track of license keys and firewall settings (not in the same list of course).
Just make as comprehensive a list as possible.
You're going to spend time rewriting things that currently work? That's a recipe for disaster.
Unless you can predict when something will fail (as in - the database uses 16-bit indexing, so when we hit 65,536 orders the database will crash), it's much more effective to leave things alone.
Wait until changes are needed, then straighten out only those pieces that you have to touch when implementing new functionality.
Work to a benefit. Unless you can point to some aspect which will change in a measurable way (it's crashing frequently, it will crash *less* when I'm done, it will cost less in terms of server rental, &c), leave it alone.
No offense, but if you don't have the necessary background to know what/where the tools are; who are you to say everything is band-aided? I see this a lot with new ITs, they see something different than they would have done and instantly label their predecessor a moron; later to make "their" change and break everything. Easy on the finger pointing.
The first thing you need to do is make a comprehensive assessment; don't jump in and start making changes until you have documented everything. If you can contact your predecessor and ask about design and/or documentation that may be stored in an industry standard tool that YOU are unaware of; do so. Once you know how all the pieces move, then start to plan how to improve/repair it. If you dive in and it breaks, you will be blamed; if it breaks and you fix it with minimal down time, you're the hero.
I've spent the best part of my career undertaking tasks like this (as an external consultant), with my average time on an assignment lasting somewhere between 18 months and 3 years.
My aim on every project is to make myself obsolete - in that I try to get documentation up to a point where a suitably qualified individual could come in, read the documentation, and work the rest out for themselves.
My primary objectives are to implement some form of inventory control to document the what / where / why...
Once you've got to that stage, then you're ready to get in to the real technical details. Remember that you are pitching your documentation to your successor, or to some imaginary "suitably qualified individual", so documenting what a system does and why is a higher priority than commenting every line of code.
It is possible to do with one person, depending on the size of the organisation, it can be particularly rewarding to do on your own - in a small business you often find some of the users have a good understanding of some of the systems, or are keen to learn.
You stated in your post that you've assumed the role of programmer and sole IT personnel - which means you need to learn to think like a manager as well as a techie (which is harder than most people imagine!). Once you learn to focus on the business priorities, you'll understand where to begin with the technical detail, and what level of documentation is required.
Philosopher (n) - a wise person who is calm and rational; someone who lives a life of reason with equanimity
The situation being understaffed and underfunded but expected to keep everything working... my advice is get out while you can.
It just isn't worth it. The reason why the systems are all patch and duct tape is because they think cheap is good management - and the longer you keep it running the more it proves it to them.
And hey, their new boat they bought with their bonus for keeping expenses down is awesome!
I worked in this environment for one year as to not tarnish my resume. I toughed out the last 4 months absolutely burned out and bitter. You cannot communicate to management that outages and issues aren't your fault; they're adopted. When you fix things, you'll inevidably miss something (I did because of the pace, not dictated by me). Get out. It's not worth the challenge to get proper budgeting to get the right tools in place or the organization as a whole wouldn't let things get how they are in the first place. The business model I came from is failing. If you're good, there are better paying, better rewarding, less "heart and soul" companies out there. You're doing basically startup work for at will employment pay.
I'd amend that to a big "maybe" for sticking around.
All of what you said (and the initial reaction to quit in the GP) all hinges on the root cause of the mess. If it's a result of the predecessor not doing things correctly and flying by the seat of his pants, you're correct at jumping at the opportunity. However, if it's caused by management screwing IT every chance they get with poor timelines, lack of funding, no foresight, and so on, run like hell.
As someone whose been fucked by a sociopath boss for trying to take on a similar project, I agree with the OP, QUIT, run far and fast. Its not worth it, being able to say "I took a broken system and made it run like clockwork" is worthless at most job interviews, they're not hiring someone to fix their broken system, they're hiring an admin (from the impression I got) if a company knows their system is falling apart, there's contracting companies that specialize in that, unless you're trying to get into one of them, don't bother.
Its really not fun being blamed for a critical failure caused by the idiocy of a predecessor, and then fired for it.
agreed. As soon as I saw this was an IT department of one, I could tell the exact amount of care that management has on getting things like this corrected. These things are in place because management does not want to provide what is needed. If they only want to pay for band-aids, that is all they will have.
This isn't necessarily the case though. I have a friend who took over IT at a small business. When he walked in they were using pirated software and their IT was a complete mess. After he put in hours to get it fixed up (with personal support from the owner), they ended up offering him an executive position with a massive pay increase. Some small shops with one IT guy really just don't know what they are doing, and haven't had a person in the job to tell them what is being done wrong. Your advice is still good though. A person in that situation needs to test whether they have management support to do things better. If so, it can turn into a career making opportunity to turn things around. If you can't get the management on your side though, it very well could be time to start looking for another job with more supportive management.
Atanamis
I walked into a similar nightmare two years ago. Before I even took the job I assessed the situation and gave them a proposal for what needed to be done and a price estimate for the software and hardware. I told them I would not take the job unless they committed funds to support the function. I also warned them that there were numerous ticking time bombs and I'll defuse them as fast as possible but there was no magic fix and it would take some time and they could have a disaster still
I then convinced them to only hire me part-time and to also hire a part-time desktop support person for a few reasons including they don't want to pay me to do that and having two IT people at least gives you some continuity. Even if the desktop support guy doesn't know the high-end stuff, if I leave the desktop person can still guide the new person and save them a lot of time I never got.
My line of attack was:
Getting back to original point, a one-person IT shop is suicide. Them having a two person part-time crew is better because if one leaves, at least the other can provide some sort of continuity -- and that happened already. The fairly young guy I hired for desktop support two years ago died last month :-(
The first step is to define your goals. What do you want out of this?
1. a job
2. learning new skils
3. leadership
4. a chance to grow in the company
If you are the sole IT/programmer person, this is a company in dire need of management with clue as to IT. You could be that change and end up being a manager of IT for this company. You have to work you butt off, fixing things, dealing with budgets and hiring staff. Can you deal with upper management to accomplish everything? That's up to you to decide.
What I won't recommend is killing yourself for a company that is unwilling to learn from its mistakes and do it right. In that case, just treat it as a good learning opportunity, but don't kill yourself. They won't always be able to hire a superhero to come in and keep things running. Or if they do, it will be a well-paid consultant and they will learn their lesson quickly how much it costs.
There is a reason this company has such poor IT systems. You could up being the IT guy in a long line of IT idiots.
Yep. Hop into the waders and get to work. It can be a very rewarding experience turning a steaming pile back into a smooth running good looking machine.
To add to the above, document everything. Though it sounds like you're already doing that. Make sure it's documentation that works for anyone not just you. Don't take anything for granted. Automate whatever you can, including problem detection and notification. (save yourself from having to check things daily or weekly, have it shoot you an email or something if a common issue crops up again)
Make sure your employer fully understands the situation you and they are in, so they don't expect you to be doing improvements and striking things off their sore to-do-list that they were probably hoping you'd tackle the day you started. Get them a timeline as soon as you get something of a grip on the situation, tell them where you're going to be spending your time to start with, and the reasons why it's essential and going to delay their getting their bells and whistles and visible bang-for-the-buck of hiring you. Otherwise they may think you're just sitting on your butt because they're seeing no tangible benefits.
If you've got a LOT of things that need to be fixed, things that can be done by closer to trained-monkey level, consider getting a temp assistant to help you dig out. Someone to run around and reimage machines, fix networks, repair stations, do RMAs, etc while you pull up your sleeves and unhack the servers. But if they're not in that big of a hurry this may not be appropriate.
Good luck with it, sounds like fun actually, a challenge at the least.
I work for the Department of Redundancy Department.
It is probably a combination of the two. Because MGMT always assumes IT can do something with very little, and often the Impossible with Nothing.
We are skilled (most of us anyway) problem solvers, and they rely upon that to function. I hate to say it, but to the original question should be answered this way: HIRE outside consultants to evaluate your system(s), and give you a hard copy report on their findings that you can present to MGMT.
If the situation is as I believe, it is worse than he even suspects. He needs more help than he can do by himself, to get ahead of the curve.
Agent K: A *person* is smart. People are dumb, stupid, panicky animals, and you know it.
Best advice, right there. It's a challenge for certain, but making things better is the best thing you can do - for the company (ha) and, far more importantly, for yourself.
Hang in there!
And although it may feel like the whole place is going to fall apart any moment, it hasn't yet, you're in charge, and it sounds like you're gradually making it all better. Take a deep breath, Don't Panic, it'll be okay.
Love sees no species.
I'm familiar with that kind of mess... Best advise is to piss on all the fires as quickly as they arise and unfortunately put in over-time putting in place alternative implementations that prevent it from happening again. During those brief intervals when something isn't burning survey the land and determine the highest risk (read cost and likelihood of failure) and take proactive steps to mitigate. At first you'll feel like you're in the worst kind of hell but eventually, things will start to come in place and you'll be able to enjoy your hard earned vacation. Further, it's quite probable that you don't have a complete knowledge of the technology being used. Get it. There's no substitute for actually knowing what you're doing. It's unfortunately far too common for people with no time and/or interest to search the web for a snippet of code, or set of procedure steps and hack these into place with out the slightest clue what they're doing nor the consequences thereof. It's usually better to go sharpen the ax before you go into the woods even if it seems like it will take more time. Trust me, it will pay dividends later.
Two of my imaginary friends reproduced once
The number one best thing you can ever do in your situation is ask your bosses what they think the system should be doing.
Step 1: All the squirrelly business logic and the rationale behind each system you have to maintain should have a plain text description. You have to know the 'Why' before the mess of band aids that is the 'How' will ever make sense. Have your boss (or his secretary, or whoever) document it and get it to you. Do NOT do this step yourself. Repeat do NOT perform this step.
Step 2: Put out fires till someone not you finishes step 1. Start making backups of every last scrap of data you can get your grubby hands on.
Step 3: Once step 1 is done compare it to the mess. Note where the realities that are in your bosses head diverge from what is actually happening. Your job is to now create a detailed functional spec that takes what your boss says, and expand on it with what is really happening. Try to include worst case scenarios and document them as intended features.
Step 4: Have your boss and sales and marketing, and every other top level manager sign off on it. This will not happen. No two managers in your company will fully agree on what the current system is actually doing. Your goal is to figure out what sales and marketing are telling your users that your products do. Do not disregard this step or it will come back and bite you very hard.
Step 5: Once every department actually agrees on what your job really is, you will be well equipped to start the long process of fixing things. Again make lots and lots of backups. Management will sign off on step 4, then you'll fix a gaping security hole, and some customer somewhere will throw a raging fit because sales promised that they'd be able to get admin access to your databases or something ridiculous.
Step 6: Don't be an ass. When step 5 inevitably happens, explain the miss-step in communication graciously, and roll back. If you pulled not being an ass off properly, you now have a great platform to explain to management why X was a bad idea, and present an idea to fix it.
I'm a grizzled vet to your situation. If someone would've told me what I just told you when I started out, there would have been a lot less headache and stress. Hang in there, it can be an intensely rewarding experience.
I think you missed the part where he said the company was thriving.
It means whatever he does is irrelevant, because they are thriving with whatever they have. I think he needs to check if they are really an e-commerce company and not a money laundering operation for some drug dealers, in which case he is set for life.
You can't handle the truth.
So you worked for one whole year in this industry, and that gives you insight enough to know that there is only ONE reason that things get to this state?
That's interesting because I've been in this industry for 22 years and I can list at least two possible reasons. The obvious one you're missing is that there is budget but the previous guy was an idiot.
I'd say there's a 90% chance it's the latter. Budget is easy to come by in a thriving business, but people who know what they are doing are still rare (hell, this Ask Slashdot is basically "help I don't have a clue what I'm doing").
I've been through similar situations a number of times. For the people who are telling you to get out of this job, I say: not necessarily. If you manage to fix these things, it can be a great learning experience and it can help you earn a name for yourself.
So my advice is to start out bringing these problems to the attention of management. You don't need to be pushy, but be very clear that you have found these problems, that you think they're serious problems, and that the problems may endanger the success of the company. Give them a little leeway on how to direct you. They probably won't want to throw lots of money at the problem, but if they don't seem genuinely concerned and looking for solutions, then start looking for a new job.
Second, get ready to learn about project management, because you're not fixing all of this at once. Make a list of what needs to be done. Prioritize that list. Estimate the time needed to do each task. If there's something extremely high priority that will run up against a specific deadline, then figure out what's necessary to meet that deadline. Start working on a budget.
Start setting schedules for each thing that needs to be done, but recognize that the schedule will have to be flexible. In fact, don't bother scheduling things that are low priority until you've put out some fires. Keep them on your todo list, but consider making a separate "to do eventually, but I'm not going to bother thinking about it right now" list. When you have a schedule set, get to work. Keep track of your progress, and keep management informed of your progress. Keep them informed about problems and obstacles that you encounter along the way, especially if they'll cause an increase in your budget or a delay in your schedule.
You'll want to gather some good project management tools along the way. At a bare minimum, these tools will include a calendar, a todo list, and a way to keep organized notes. Set aside time every week to review your notes, your calendar, your todo list.
You can take project management classes, but most of what they teach you comes down to this: Make sure you understand what you're trying to accomplish, and that what you're doing is actually the best way to accomplish it. Keep your stakeholders informed, and listen to their feedback on your progress.
You document what's there. You've already started that. Next you document what's deficient. Then you put together a plan that, in stages, makes things better. Then you propose that plan to your management in terms that make sense to business people (happier customers, money saved, disaster avoided, etc...). Then you execute the plan.
"I'd rather be a lightning rod than a seismometer." -Ken Kesey
Either because your predecessor 'made it work' with little or no funding (better translated as 'he made it almost work'), or because your predecessor failed to acquire sufficient funding to do it 'right'.
As a former field tech/consultant, try to avoid bringing in consultants to explain why the stuff needs to be bought. Many a manager ends up believing the consultants and disbelieving their staff. You get to either hire the consultant to justify your plan or find yourself undercut by that lack of confidence.
And of course nail some problems and show improvements as early as you can. It's wise to both solve pressing impactful problems first, and gain trust.
I always loved going into a client with lots of problems. Not just for the thrill of making things right, but knowing that if I did it right I had a referral for my next client - because the end result was most often working my way out of the gig. Either I passed the client on to another tech to maintain, or they got their staff's legs under them and could carry on. So long as there are more clients, this is good. Great fun to figure things out, isn;t it?
deleting the extra space after periods so i can stay relevant, yeah.
This.
I'd be willing to bet a year's pay that the previous guy wasn't straight-up incompetent. He was probably relatively skilled, and doing the best he could with the resources at his disposal. Which were probably not actually the resources he needed.
Odds are good that there's a reason why the place is in the condition it is now.
Odds are good that there's a reason why the last guy isn't there anymore.
Odds are good that you're going to need more than one guy in IT to get it all straightened-out.
"Work is the curse of the drinking classes." -Oscar Wilde
It's not worth the challenge to get proper budgeting to get the right tools in place or the organization as a whole wouldn't let things get how they are in the first place.
I haven't been there, but it sounds like it would be very beneficial to learn to present the business case for upgrades and budgeting. Explain the difference in downtime that it would entail, and the benefits the company will get. From what I've seen our previous IT guy do, it seems that bosses are NOT opposed to spending money, as long as you can make a good case for why it's necessary. Put it in terms of dollars that it will save you and that will go a long way.
My current employer was in that state when I started 7+ years ago. I enjoyed the diagnosis and repair. If you like the metal test of fixing problems it is great. Hang in there and when you are done fixing everything you will have something to be very proud of. :-)
Kosh: "Understanding is a 3 edged sword, your side, their side, the Truth."
They have a guy who finds upgrading phone systems immensely satisfying! If he's sick he'll come in and fix it and who needs vacation anyway, he'll take the cash instead.
I'm betting it's a psychotic break and he IS his predecessor.
Deleted
First step would be to evaluate everything as posted above.
Then build a Action Priority Matrix. It'll help you fit together an action plan and block out time for what appears to be major projects. It also allows you to get some Quickies done to show management you're the right guy to keep doing the job.
http://www.showingnaturally.com/ActionPriorityMatrix.png
You will probably be getting a large number of suggestions. I have done both support and development on mainframes and servers so here is some input:
1. Let management know at a high level the state of the machine(s) and get permission to spend part of your time documenting the system. When you get permission ask them for how often they need updates and how much detail. Keeping them in the loop seems to make them happy and feel important.
2. Document the current state and highlight areas of concern. Put down what the concerns are, the risks and the potential costs to the company if it fails.
3. Go through the document and organize it by risks. Try to figure out the size of the risk and how much work it will take to fix it and what is needed to fix the problem.
4. Automate as much of your process as possible. Any task you have to do on a regular basis (in my humble opinion if you do it more than once then automate it) should be automated. Dedicate time to document what you did.
5. Senior management is probably not wanting to see details. When you present, keep it simple and short. Point out the costs of failure and if you need software to help put that forward as an 'investment in infrastructure'.
6. If the company has an internal auditor make friends with him/her. Getting them on your side to present to management will help. Having the auditor explain to them the financial costs will help your cause a lot.
7. When you do things take the time to document what you are doing, WHY you are doing it, how you did it and where to go for the programs/scripts/data.
8. Pick the brains as much as possible of all the people there. Offering to buy coffee and donuts seems to make them more receptive to an informal
session and the amount of information they have could help you.
Part of every project we do now is dedicated to documentation and the client now knows the importance of that documentation and is happy to pay for it. The current system is over 25 years old and a lot of business knowledge has been lost due to people retiring or leaving. When we find things we put them into a document. The hardest thing to find is the 'WHY', but, once you get that the rest of the information starts to make more sense. Our most popular section is the 'HOW TO DO' as this is the short cut for every other document in the system.
When you do your documentation try to keep the documents as open as possible. Try to avoid proprietary packages as much as possible. We had an old flow chart program that we didn't have the program for and it took me a week to find an open source package that could read and export the files.
Panic now, beat the rush!
When I'm given a spoon and told to storm the hill and kill everyone in the machine gun nest?
yes. I quit before I even try. After you have been in IT long enough you can spot a suicide mission a mile away.
Do not look at laser with remaining good eye.
Get a whiteboard. Put your task list on it, in priority order, with time estimates. Order should be based on a business decision - what's the financial risk of something failing. Backups and security are always pretty high on my list.
Get buy-in from management on the ordering, because when something breaks (and it will), you need to make sure that someone above you approved the risk ordering.
Once you have a priority order, then figure out how much it's going to cost to do each one. If mgmt considers something a #1 priorty and is only willing to fund 10% of the price to fix it, then you have a pretty clear warning that it's time to look for a new job.
When tasks are finished, cross them off but don't erase. Make sure everyone knows that things are getting done.
Don't let anyone rearrange the task ordering without a financial justification that's approved by mgmt.
When I was hired to run the IT department of a major company my predecessor left three letters in the desk that was now mine. Each letter was clearly labeled; System Failure #1, System Failure #2, System Failure #3. A post-it note was attached to the bundle of letters.
In case of a substantial system failure open the letters in order, once per failure, and they will help you through the problem.
I put the letters back in the desk and forgot about them.
About one year later we had a cascading server failure that left our corporate intranet and several important production servers off-line. While repairing the problem I remembered the letters. Curious, I opened the first letter.
Blame me, your predecessor
The day after we got the servers back up I was called in to my boss;s office to explain what happened and why were down for so long. Taking my cue from the letter I blamed my predecessor. My boss was satisfied with my answer and let me go.
About six months down the road we had another big failure. This time our primary database server went down and the secondary was having trouble dealing with the load. I had to put a lot of extra hours into getting them back up and we lost a few transactions due to the backup server not being able to function under the load.
Once again, I reached into that desk drawer and opened letter #2.
Blame the equipment
This time I lamented to the boss about how it wasn't my fault. It was that backup server! If we had some good equipment to run on these things just would not happen. He was satisfied with my answer and I went back to work.
Things ran smoothly for the next 18 months. Then we got hit with a virus that somehow got past our firewall and wreaked havoc on our systems.
I opened the third letter.
Write three letters
(Sorry, this was the first thing I thought of when I read the summary)
Is it just my observation, or are there way too many stupid people in the world?
The combo of Observium (network monitoring), Hobbit (monitor everything with extreme ease), and either ESXi or Proxmox VE for consolidation and ease of management/isolation/testing/etc has served me well for years to take control of large organizations quickly. Last two business I was hired to fix, I set this up and then built a parallel enterprise as VMs (the right way this time) and then cut everyone over in a weekend. No one noticed the change except to say stuff didn;t crash anymore and it was really fast.
Also OpenFiler and NexentaStor make for a great SAN.
If you need more: PFSense for firewall or VLAN router, BlueIris for IP cameras, PBX in a Flash for VoIP, SoGo for Outlook compatible email, LibreOffice, etc.
[RIAA] says its concern is artists. That's true, in just the sense that a cattle rancher is concerned about its cattle.
Oh god yes do this.
If your bosses will sign off on getting a second opinion, great, stick around and fix stuff. If they don't even want to know that it's screwed up, get out as soon as you can.
Just be very careful when selecting who you'll bring in to do the audit, and be very clear that if anyone is brought on to help fix the problems, it absolutely will not be the same as the evaluators. Otherwise you're essentially handing them a blank check to say whatever they feel like is wrong, and fix it any way they want.
My suggestion is to generally avoid letting contractors do more than consult with you on a project--they know very well how to set things up so that it's easy for them to work on in the future, and are generally not very good at making the stuff actually fit in well with your business processes.
<xml><I><am><so><damn>Web 2.0</damn></so></am></I></xml>
You already know that it's a tangled mess. You need to map that tangle throughly before you start fixing/replacing/retiring anything. The conversation you do not want to have with your superiors is why retiring system X (which costs $5,000/month) took down system Y (which makes $100,000/month). You need to map out both the business processes (which systems they touch) and the system dependencies (trust no one, log network data and look at the traffic between boxes). Do not start pulling strings until you know what they're connected to.
You're not going to do this by yourself... at the very least you're going to need someone who knows the business side throughly. I've walked into a situation like this before for a very, very large company and I swear it took years off of my life but I learned a whole hell of a lot from the experience. Best of luck.
I've been in a few similar situations over the years. The first thing you put on the table is "This is not an acceptable situation. Your risks are .".
If they don't cover this, then that's really not your problem. I've coding for 32 years, and doing sysadmin stuff as well for about 20 (among other strings to the bow), and live in despair of people who really don't understand that this stuff doesn't happen by waving a magic wand, and there is more to it than making pretty buttons appear on a screen.
At interview, if someone said they'd reverse engineered and documented a system in this environment (and yes, I interview people for dev/admin jobs from time to time), I would seriously ask them why they didn't get management a junior to cover the paperwork and cover duties, while they dealt with the heavy lifting of reverse engineering and planning. I want someone around who will grok the risks, take responsibility and come up with a resilient service (not just a few machines that may be able to fail over). Budget isn't always easy to come by, especially if there are political axes to grind.
I'm with the AC on this, from the limited info available. Either get them to get you a second, or get out. If the business is thriving, they can afford it, and they're just being cheapskates (and in many years, I've met quite a few like that) if they don't. You don't want to work for a cheapskate.
The time to take this kind of work on solo is if you're part of a startup, when you've got a lot invested in the success of the company. You live or fall on your wits, capability, and ability to lose every evening, weekend, and many a night too, on keeping this up and running as cheaply as possible.
Once the 'thriving' level arrives, you'd better make sure you're not still carrying that load alone, otherwise your own lifespan (as well as that of the company) may be quite severely limited.
Completely agree. Perhaps the previous guy didn't take the time to inform the management of what was required to do the job properly, or didn't know himself, or was more interested in painting himself as indispensable than doing the right thing. First things first, if this is genuinely a thriving e-commerce company then their website is their number one priority and their fulfilment systems are the number two priority, phones are number 3 with everything else taking a back seat - and they REALLY need to get a second employee. If you are ill, on holiday, or, deity forbid, something happens to you, then they need someone else who can step in. If their infrastructure is as shot as you suspect then you're going to need a second brain to sort it all out and help you implement it.
You must make sure that backups are being taken and are robust. You need a disaster recovery plan. You need both short term and long term plans to scale the infrastructure as the business grows and reactively if there's a sudden growth spurt. You need to know where the next bottleneck in the system is and come up with a plan to fix it. Do you have an adequate handle on monitoring traffic to the site from when they first land through to placing an order? Do management have the stats required to make informed decisions about the business? Management will also need to be aware of when IT will need extra funds as mapped against their own sales growth targets.
Once all of the above is sorted, and decent management allowing (and presuming this isn't something that is already being taken care of), you need to start suggesting to management the skillsets of people and / or contractors and / or agencies that need to be brought in to proactively grow the business. Be it SEO, PPC, UX, new features, etc. whatever it is, you have the opportunity to help the business understand it all and be instrumental in their success.
Depends on the management. 7 months ago I inherited a dysfunctional department with morale in the crapper and a seeming inability to do anything. The organization brought in a new director (my boss) and new department manager (me). I got a lot of funding and a lot of resources to fix things. Net result is that we are now ahead of the game for the first time in 5 years, people like being here, and we're having a blast.
So if management is providing support and resources, I'd say go for it. If they're saying we like it the way it is, then leave.
I agree with what you have said, however with one minor caveat...
Based on this - "I assumed the position of programmer and sole IT personnel at a thriving e-commerce company." I am assuming...
1) He says he "assumed" the position which would imply that he worked elsewhere in the company and was made the de-facto IT person based on having an Android phone or a PS3 or whatever other metrics they decided to use. I am going to give the Poster the benefit of the doubt, and assume he is not in over his head.
2) If the company is a thriving e-commerce company, which means that they make their money off of this e-commerce platform. Which should mean they are wanting to protect the investment that they have already made.
Now the problem here is that there are all kinds of red flags for doing things on the cheap, which is why you are finding all of these band-aids. If the company is thriving they should have no problem hiring a second person. Regardless of your level of skill mistakes will be made and these can be reduced if you have a sounding board. Someone to logic check things with.
As for actually identifying and making changes the parent has that part spot on. I am just concerned that may not be effective in this organization.
$diff terrorists hippies
$
$rm -rf *terrorists *hippies
You wouldn't be in this situation if your employer gave a crap. It's plain and simple: you report to someone. They know the extent of the problem and that there is only one of you. If they cared, there would be more than one of you. But there isn't. So turnabout is fair play.
This is the true American solution to your problem: find other people to exploit and skim off the top ...
Step 1: tell them you're going to become a telecommuter so that you can work 100% of the time
Step 2: get on elance or some other such site: hire gobs of cheap (dubious) overseas help at $1/hour
Step 3: instruct them all to send emails from your address and answer the phone with your name.
Step 4: find a different job and just let your sub-contractors handle that one until the house of cards falls apart
If your current employer calls you out on the fact that you have 15 different accents and sometimes answer the phone in a female voice, ask them why they're so racist.
bonus if you used a pseudonym when hiring for your present job.
I'll go on a limb based on my own current experience.
I think just about all companies bigger than say seven people need two people split half IT and half "line functions".
Then when everything is humming, they can "just work". But when a cascade situation comes up, you do those Tier Levels. Level 1 does all the End User fallout. (Every computer needs to get that new utility installed, then all the printers quit working because of a 2 minute power outage (winter is coming), User 1 wants to know where their file is that they worked on for 3 hours. Oh look, it's in a temporary folder because it came straight from an email. etc.)
Then Tier 2 deals with all the system configs, there could be a software change coming, etc. That second pair of hands seems to be more than the sum of the hands in IT when managers want something fixed. I've done the Level 1 Helpdesk for a while now, with the second man more behind the scenes.
My first Journal Entry ever, in 8 years! http://slashdot.org/journal/365947/aphelion-scifi-fantasy-horror-poetry-webzine
Preach it. Any job where you're the guy that has to fix it... the challenge is a HUGE learning opportunity. Evil boss or no. I've worked for a glory seeking back-stabbing boss before. That in itself is a good learning experience about how to appropriately protect your flank while still being able to do your job (and what kind of boss not to be). All that prior experience helped get me the job I have now, and it just happens to be for a great boss. I'm always in a constant state of character and job skill development. When it no longer is learning opportunity, find another job. Never burn bridges. Ever.
It sounds like it was your boss that was the problem rather than the project. If you can't communicate properly to your boss why there is a problem, what it is, what the consequences are, what you will have to do to fix it, approximately how long it will take and which problems/systems have and have not been fixed (and therefore problems are all your responsibility) then it isn't going to work out. That's a lot of work, unlikely to be a lot of fun, and takes two people: you, to give the right information, and him, to actually listen and understand and honestly report it to the rest of the organization. If, after you've communicated properly, he STILL blames you for prior inadequacies he can see (but maybe not admit) are not your fault then you're probably going to have a problem no matter what state the systems are in.
Oh, and you don't say "I took a broken system and made it run like clockwork". You say something more like, if you can, "I specified, designed and deployed a whole new x/y/z system in a successful x month project which reduced support problems/reduced downtime/increased throughput/increased capacity from x to y within existing hardware and budgetary constraints". That demonstrates more than "I took a system which already works well and managed not to break it"
As someone who has inherited a bowl of spaghetti more than once in his day, I can say definitively that it's all driven by upper management/ownership. You're given a limited set of tools and an even smaller budget to make sure everything not only runs, but runs at peak efficiency. Then, add in incompetent end-users that are allowed, nay, ENCOURAGED to build undocumented, unstructured and barely maintainable "applications", and you're in for some real fun.
I was here for the last 6 years. I came onto a team of two, and my co-conspirator was quickly promoted to my boss. There was never a replacement hired.
His basic philosophy was "if a patch can make it work, don't spend the money". any problems we had were never passed up to people with the money to fix things.
Things weren't too bad until he left two years later and they hired a non-technical person to be my new boss (we want someone with more artistic ability. Blech). I presented a list of all the things that needed replaced, upgraded, repaired, un-duct-taped, etc -- listed by priority, severity with downtime analysis, recovery, etc. It was a significant list, and I didn't expect everything on it to be done, but anything was better than nothing.
New boss passed the list in full force up the change, which initiated an audit, which was painful but I figured things would be better at the end. The auditors got fired before handing in their final report, management marked it down as "problem solved" and no money ever showed up.
Fast forward 4 years and the company cut pay "due to the economy", killed vacation benefits, and made working there a living hell. Due to cut pay, my ability to work overtime magically disappeared as well. I was there for six years, and I left handing over the same servers I took over when I started.
It was a mess, and I managed to stabilize systems by never changing them, but it was a patch job. Management didn't care and I held my breath every time a server rebooted. I'm happy to have gotten out before every collapsed.
To the OP, I say best of luck. If you can get traction, then stay and make the place awesome. If everything is an uphill battle, run now. The writing is on the wall.
My backup was my boss who was technically competent, so there was that, but it's not like I've never worked a job as a one man show. You buckle down and do what you have to and make it so things don't break just because you're not around (yes, this requires budget, but I've been fortunate enough that anyone willing to pay me what I'm worth is also willing to invest in a solid infrastructure).
This made you (and the situation you described) an outlier, one with a positive outcome. Your experiences cannot be applied in general. In general, this is rarely the case, for one-man-tech shops that is.
For the most part, conditions as described by the original submitter typically have "GTFO ASAP!" written all over it. I've done IT in companies, small and large, and I can attest that what you say is true: Yes, it is possible to being the one-guy-IT-slash-programmer-shop at a small e-commerce company. But the question is why? I wouldn't do it (again) unless a good compensation package came with it (which is typically never the case), or if I'm fresh out of school with nothing on my plate to take (in which case, it is ok.)
Good companies are never based on one-man-IT-slash-dev-shops, regardless of size (or at least they try not to.) I know, again, I've worked with companies big and small. Conditions like that are typically good proxies for more systemic problems, and at the end of the day (whenever possible), you want a paycheck, a rewarding job and good working conditions. Rarely you see that with one-man-IT-slash-dev-shop gigs, rarely if ever, regardless of the size of the company.
That's just my $0.02 input from what I've seen. YMMV so readers be warned and please take this anecdotal piece with a grain of salt.
. . . document everything. . . [m]ake sure your employer fully understands the situation you and they are in.
This was the first thought that came to me. While there surely are employers that are the personification of Evil (I, too, have met my share), most are simply trying to do the best they can, but are hampered in their ability to help you by a lack of time, a lack of knowledge of IT subjects, or both. Because of this, they can't independently judge the quality of the advice you're giving them -- i.e., they have to trust you. Since (at least in my experience) most conversations with management IT initiates are, in one way or another, a request to spend money, you can see the problem: It's difficult to continue to trust someone whose solution to every problem is to take money from you.
One technique I have used successfully is to never, ever refer to the IT department in particular, but always refer to the company as a whole. Also, as the parent says, document everything -- but do so in terms that will be meaningful to management. "The language of management is money," Juran said, and he was right. The infrastructure must make the company money, or it wouldn't be there -- talk to management in terms of the risks they are running to reduce revenue, increase expenses, or both.
Most business people, like investors in general, dislike surprises, and prefer to know their risks in advance. The trick is to present the situation in non-threatening terms, so that the boss feels like you're trying to make more money for him. Even after doing so, one has to be prepared for the possibility of a negative response, possibly even for a valid reason -- there's no cash available this month; they're planning to move anyway, etc. Or the boss just made a mistake. With any luck, the documentation you have on file will protect you, should blame be directed your way. If not, well, you didn't want to work there, anyway. Did you?