Ask Slashdot: Getting a Grip On an Inherited IT Mess?
First time accepted submitter bushx writes "A little over a month ago, I assumed the position of programmer and sole IT personnel at a thriving e-commerce company. All the documentation I have is of my own creation, as I've spent most of my time reverse-engineering the systems in place just so I can understand how everything works together. Since I've started, I've done everything from network and phone upgrades to database maintenance with Perl, and thus far it's been immensely rewarding. But as I dig deeper, I notice the alarming number of band-aids applied by my predecessor, and it seems like the entire company's infrastructure is just a few problems away from a total meltdown. The big question now is, how do I, as a single person, effectively audit the network, servers, databases, backups, and formulate a long-term plan that can be implemented by one person? Is it possible? Where do I begin?"
You work at RIM?
say goodbye to your life for the next year. hope you're getting paid to mislay it....
Nuff Said.
start drinking
Automate your servers so you can focus your time elsewhere. I use Cfengine.
http://watson-wilson.ca/2011/03/enterprise-system-administration-using-configuration-management.html
UNIX/Linux Consulting
Been there, done that. Start with a simple wiki. Document everything including lists of things that need to be done. Put time and dollar costs on everything along with your idea of the priority. Present the list to the CEO or whomever it is that is above you and work on prioritizing it, then get to work.
you hire more people and do a thorough job of cleaning it up or rebuilding.
HTH
Deleted
We don't need to know you're single....
Good luck at fixing other people's mistakes...
Dude, that is to easy. There are serious wiseacres on this board.
is there any software that alerts you something is wrong? at the minimum it will tell you what is out there? what about the backup software reports?
i use netbackup to back up our MS SQL servers and didn't like the built in reporting. i wrote my own procedure to import a few tables from the msdb database where the backup data is kept into a central database and use SQL Server Reporting Services to send daily emails of the latest backup times of each server/database. along with a few alert reports of databases that were never backed up or haven't been backed up for 7 days.
Did the last guy outsource everything to india?
Assess your most vulnerable items. If that is a server, a network component, application, database etc. Give them all a critical score. Share that list with your boss/manager and work the list one item at a time. You can't spread yourself too thin when working a project like this, so focus on one or two items at a time until you see a light at the end of the tunnel.
Brief your management on the situation. Explain what condition things are in and what is needed to get them into a manageable state. Give them a list of projects / tasks that you have to deal with and get them to prioritize.
blindly antisocialist = antisocial
IMO, don't loose any sleep over it. The description covers most IT systems.
Get a firm grip on your steering wheel, and keep your car pointed away from that company.
Better known as 318230.
You need to document it and get management to approve spending money.
I'll bet you $100.00 the band-aids are there because management refuses to spend money on Infrastructure and its' why it is a mess and the guy there beforehand has left.
99% of the time a hosed IT infrastructure is because management refused to spend any money so it had to be half assed.
Do not look at laser with remaining good eye.
2 steps need to be done: 1. Tell your boss about these terrible problems. 2. Demand more money.
Facts:
1. The job has lasted for 1 month so far.
2. The e-commerce company is 'thriving' apparently'.
3. All of the systems have been "reverse engineered" in that 1 month.
4. All of the documents are written in that 1 month.
5. In 1 months there have been: network and phone upgrades and database maintenance with Perl and it all has been 'immensely rewarding'.
6. The entire infrastructure is 'a few problems away from a total meltdown'.
7. Single person IT operation to do everything.
Question: is this for real? What's the size of the company and what's the budget?
You can't handle the truth.
I refer you to Gilbert and Sullivan's Mikado.
Work planning. First, what are your goals - how do you want everything to look for you to be happy and comfortable with it? High level, followed by more specific details for each. Ensure that your design is flexible enough to allow additional features and functionality to be incorporated without a total revamp. After that's mostly done (it won't be complete at this stage), start charting expected timelines for tasks, dependencies, and costs. Go back and revise your goals the plan, to account for everything you thought of once you went through the first time. And since you've never done this before, go through it a third time and a fourth and a fifth - continue until you can make a pass without modifying anything significant. At the end of the main planning effort, you should have a timeline of bite-sized tasks that must be completed to allow other bite-sized tasks to be performed, along with their associated costs. Present the timeline, cost, and benefits to your boss. Get approval for the budget, and get to work. Document everything you do, or (if you're short on time) the dead minimum has to be documentation on the final results of each task.
Just buy a few cases of your energy drink of choice and put Eye of the Tiger on repeat until you've got it all fixed.
I believe in you.
Tell/emai/post your opinion and observation, as detailed as you can, alongside with your concerns. Make sure your managers see it. Do not expect them to do anything about it. Do it for your own reference, so you may continue working normally. Do not overwork or overworry yourself, for that will not bring you nor the failing systems anywhere closer to resolution. Do your normal job, stay cool and speak up. You are in drivers' seat.
above
Deleted
You say that this is a "thriving ecommerce company"...I'm just wondering how it's managed to achieve and maintain "thriving" status with a single member of IT personnel?
"I assumed the position of programmer and sole IT personnel at a thriving e-commerce company."
Wait.... a thriving e-commerce company has one IT person? Am I missing something here...? No wonder everything was band-aided together. They have one person doing everything.
You may want to consider hiring an outside firm to come in and do the audit for you. The last thing you need right now, on top of your daily workload, is to perform an audit. That, and a third party firm creates a sense of objectivity, and would eliminate the "The IT guy wants a new toy" response from the CFO.
No man is an island, But if you take a bunch of dead guys and tie them together, they make a pretty good raft.
Always start by making sure the backups are working properly.
Website Just Down For Me? Find out
Pull out a shotgun in the middle of a meeting with management and splatter your brains on the wall behind you to the shock of all your coworkers and management.
Report the problems to your supervisors immediately. Let them know what you plan to do to address the problems and if you need any additional resources (like extra staff, overtime or an adjustment to your schedule so you can make major changes when no one is using the system).
It important to keep them in loop so they can make decisions based on their budget. They need to be aware of their dependence on the system, its fragility and the need to invest to ensure its continued survival.
At some point, decisions about where to invest need to be made and keeping those decisions-makers informed is important.
we've been running ever since..
By bottom I mean the servers that the least number of systems depend on.
Get them humming, with an eye toward your migration path.
Then, be methodical. Backup everything often. try not to do anything easily not easily reversed, and let the folks know the true state of their systems.
This actually sounds like fun.
I am very small, utmostly microscopic.
All I can say is that it could be worse- at least the guy isn't still there defending his band-aids.
Advise your boss that an audit is in order due to these anomolies that you have found. Recommend to the financial controls / legal team that they bring in an outside firm to come in and look over the infrastructure and where needed, upgrade your code / configurations / infrastructure to industry best practice.
Relax it has survived so far, so likely it will continue as long as you don't make huge changes without a back out plan.
Get a security scanner software or pay someone to audit the external facing servers, that helps build confidence or scare you silly. Most things found in that kind of an audit are fairly easy to fix and low risk (patching software, limiting unneeded services and such).
Second get everything into some form of revision control that is possible. Source, images, web pages.
Back up everything into tarballs or zips so that if you make a change you can undo it quick if it goes poorly.
When it comes time to make changes do a small one first, like updating the copyright at the bottom of the page, re-deploy everything, test as best as you can. This is a confidence building measure then advance on to larger changes ...
Make sure your boss understands the current state, so it positively, since after all they hired the person before you ...
Make sure you know what you've got. That means a db and an automated inventory tool. Installed & managed by above management engine.
Then decide what services are missing.
Then automate the rest with the management engine.
Deleted
The very first thing I would do is back everything up. Image every drive.
Next, do a Risk Assessment.
If this system goes TU, how bad will it be?
What steps can we take to help ensure that doesn't happen with this system?
Now come up with a plan of action.
Here are the show-stoppers I found
They generally fall into these categories.
This category can be solved this way, making for a more reliable system
This category can be solved this way, giving us better capabilities.
This category can be solved this way, saving this over time.
Here are all the bandaids I found.
They generally fall into these categories.
This category can be solved this way, making for a more reliable system
This category can be solved this way, giving us better capabilities.
This category can be solved this way, saving this over time.
You're going to spend time rewriting things that currently work? That's a recipe for disaster.
Unless you can predict when something will fail (as in - the database uses 16-bit indexing, so when we hit 65,536 orders the database will crash), it's much more effective to leave things alone.
Wait until changes are needed, then straighten out only those pieces that you have to touch when implementing new functionality.
Work to a benefit. Unless you can point to some aspect which will change in a measurable way (it's crashing frequently, it will crash *less* when I'm done, it will cost less in terms of server rental, &c), leave it alone.
No offense, but if you don't have the necessary background to know what/where the tools are; who are you to say everything is band-aided? I see this a lot with new ITs, they see something different than they would have done and instantly label their predecessor a moron; later to make "their" change and break everything. Easy on the finger pointing.
The first thing you need to do is make a comprehensive assessment; don't jump in and start making changes until you have documented everything. If you can contact your predecessor and ask about design and/or documentation that may be stored in an industry standard tool that YOU are unaware of; do so. Once you know how all the pieces move, then start to plan how to improve/repair it. If you dive in and it breaks, you will be blamed; if it breaks and you fix it with minimal down time, you're the hero.
The fact that you don't know the answer to any of these questions shows that you're really no more qualified to be the sole IT personnel than the last guy was.
Deming.
Deleted
Are things working? What you call "bandaids" may be actually decent solutions you don't yet understand.
Stop worrying about a "meltdown" and just get to work.
1. Back up everything. Don't start messing with things unless you're sure you can revert them back to their working condition.
2. Document things as you understand them.
3. Pick just one network element at a time and see if you can simplify it. Clean it up. Remove unnecessary junk. Check to make sure it operates as your documentation indicates.
4. Test. Test. Test. Remove the element from the network. Does the network fail as expected? If not, figure out why.
5. After you feel good about the network element, move on to the next one.
6. Lather, rinse, repeat.
Just move carefully and methodically. You'll eventually have things cleared up.
Welcome to the real world. That fact that you are the only one who even appears to be concern about the situation should be your first and only clue, as to what your employer considers to be critical, essential, important, and last but not least profitable. Are you sure that they are profitable?
I've spent the best part of my career undertaking tasks like this (as an external consultant), with my average time on an assignment lasting somewhere between 18 months and 3 years.
My aim on every project is to make myself obsolete - in that I try to get documentation up to a point where a suitably qualified individual could come in, read the documentation, and work the rest out for themselves.
My primary objectives are to implement some form of inventory control to document the what / where / why...
Once you've got to that stage, then you're ready to get in to the real technical details. Remember that you are pitching your documentation to your successor, or to some imaginary "suitably qualified individual", so documenting what a system does and why is a higher priority than commenting every line of code.
It is possible to do with one person, depending on the size of the organisation, it can be particularly rewarding to do on your own - in a small business you often find some of the users have a good understanding of some of the systems, or are keen to learn.
You stated in your post that you've assumed the role of programmer and sole IT personnel - which means you need to learn to think like a manager as well as a techie (which is harder than most people imagine!). Once you learn to focus on the business priorities, you'll understand where to begin with the technical detail, and what level of documentation is required.
Philosopher (n) - a wise person who is calm and rational; someone who lives a life of reason with equanimity
Start at the basics of the network.
Switches / routers / firewalls in good condition? How are the logs?
Then move on to the server health. Event viewer good? Whatever server management hardware they have is happy? Warranties, all that stuff?
Just work your way down the path.
The situation being understaffed and underfunded but expected to keep everything working... my advice is get out while you can.
It just isn't worth it. The reason why the systems are all patch and duct tape is because they think cheap is good management - and the longer you keep it running the more it proves it to them.
And hey, their new boat they bought with their bonus for keeping expenses down is awesome!
First thing I'd do would be to document the issues that you've found. Talk with your immediate supervisor, explain the issues and the plan that you've come up with to address it. Without upper management buy-in, you're doomed from the start. Look for free/low-cost management utilities out there. Prioritize the issues you've found and start tackling them one at a time. If you make your supervisor aware of the issues and provide an overview of how you plan to deal with them, they'll be a lot more understanding if something does break in the meantime than if it comes as a total surprise.
Explain to management that the total fruit cake in the job before built the system in suspended failure and one wrong move or random move will bring everything to a halt.
Once you have there attention just start fresh, so rebuild the most broken part of the network on a new system and slowly rebuild and re-factor from there. It will be a lot of work but if done carefully it will save the current mess your stuck with.
First thing I would recommend is plan out a restructure and rebuild of the setup. List off the critical needs and why they need to be covered. The problem most companies do not understand about IT is that in the process of cost cutting, the critical structures your previous guy was forced to skimp on will cost the company in the long run due to you trying to maintain as much as you can. Underline the need for the restructure to avoid meltdown. The upper management needs to understand that while you can try to support with minimal costs, the catastrophic repercussions come up when you have no fall back due to cost cutting and the number of days it takes to get new hardware or rebuilding of a system back to the level of functionality. A minimum of 2 days to get systems back to functionality and a dead stop to any other support while the critical systems are being rebuilt. As someone else mentioned, hiring additional support will help, but it will not help in the situation of a production level dead stop due to critical systems not having redundacy or planned upgradability. Lastly, underline the necessity to not cut back on maintenance. Running at bare minimum and no maintenance support will cause long term cost overruns as being the only person who has to maintain it will also cause long term burn out and higher turn around of IT workers, which will cost them again in the long run.
The truth is if it's that fragile, then recovery or repair are not options because you never know when you'll be done. Your best strategy is to rebuild. Organize the rebuild jobs from smallest (simplest, or least-complex) to biggest, and start from the smaller ones.
Importantly, you need to understand what your infrastructure does and why (which you claim you're already trying to do). However, the most critical point is that your superiors understand what you're up against and the risks they bite into if they choose to not go forward with the rebuild(s).
Once you understand what it is you need to rebuild, then you can do it properly: document the strategy to be followed (and incredibly important is that you document the key reasoning points behind the decision process), and plan out the implementation. If your superiors find that it consumes too much of your time, try to talk them into hiring (one? two?) more folks to help you hold the fort while the rebuilds are in progress so the day-to-day isn't left in the lurch. I had to go through this type of a situation recently and the end result of the rebuilds was that the previously inevitable downtime went away almost completely (only ISP outages were an issue). Deployment of new servers was cut down by 95%, and tons and tons of other benefits. Biggest of all: by the time I was done, everything essentially ran itself and even on the end-user support things were almost automated (granted, 99% of my audience were tech-savvy so they didn't need much help anyway). 95%+ of my time was spent just scouring logs and servers to ensure everything was running smoothly (which it was).
Then again, the key point was selling my upper management on the fact that my predecessors had done such a lousy job of setting everything up that trying to fix it was more expensive than a from-scratch rebuild, and that they were one fly's fart away from a catastrophe. You don't need to scare them shitless, just point out where they are and what they're up against if a rebuild isn't even done (even rebuild of only SOME of the systems can make a huge difference). Make sure it's clearly stated in writing (a "big" e-mail explaining the situation clearly to get the ball rolling usually takes care of that).
Key thing: DO NOT try to fix or recover the old stuff - if it's really as messed up as you suggest, you will consume comparable amounts of time to a rebuild, with none of the benefits and the added risk that you didn't fix all the problems because you couldn't spot some of them.
One other thing that served me well in terms of plotting my strategy: take the approach that I'm building something and going to be fired the day I'm done, and whatever I build needs to be inheritable and clearly understandable by my potential successors. This angle will encourage you to keep it simple, stupid, well documented, and easy to maintain/audit. In the end, this is why your predecessors sucked: they didn't think they'd eventually (be) move(d) on - but in IT, that's the one constant: staff rotation.
I have done this for a lot of companies here that have sold, gone on their own, or been taken over and have a ton of IT stuff that one person needs to figure out what is what. Get a list going of every network device, info on it, and what it is running. Once you accumulate that, you can get its version and age and take it to certain groups and determine its critical need in the company. Sometimes, things change and a server that was once critical and would think needs to be upgraded or replaced can be decommissioned or moved to another cluster or server. Once you break up your equipment into groups and critical risk, you can then plan upgrades with your capital you have each year and possibly support contracts.
This is a delicate balancing act, because by seeking out and acknowledging the problems you are essentially taking ownership of them.
The first thing you should do is let your supervisor know what you are planning on doing and getting a committment from them for dedicated time to fix the problems. This is essential.
If you don't get this time committment, you need to dial back your eager beaverness. Keep letting them know that the audit needs to be done and give a few examples of issues that need to be corrected. When you are working on other tasks, mention that this would be a lot easier if X was fixed, but that it needs to be fixed from the ground up, and again, push for time.
Otherwise what is going to happen is you are going to basically be building yourself a pile of work that is now deemed critical (especially in the event that something horrible does break), that your boss considers you responsible for, but no spare time to fix it. "Oh, just fix things as you see them!" they will say, which when some major infrastructure component and supporting services needs to be rebuilt, is completely impossible. Now instead of being the hero that helps them recover from the jerk that was there before you, you are a scoundrel who did not save them in time.
Get the time committment for the full scope of the work that needs to be done. I'm telling you, this is important, because when it comes time for someone to take the blame, it isn't going to be your boss.
As has been mentioned, begin with making sure that you have backups of EVERYTHING. Backup, perform test restores, fix any backup issues, rinse, repeat.
1. Backups: Backup, perform test restores to VMs, fix any backup issues, rinse, repeat. Make sure to examine backup logs every day for the first month or so, and at least once a week thereafter.
2. Monitoring: Implement basic monitoring, including your backup system.
3. Infrastructure: Use the monitoring to fix any infrastructure issues such as overloaded servers (high CPU, memory), overloaded network uplinks or slowdowns (high bandwidth usage, incorrect speed and duplex settings), etc.
4. Applications: Use the monitoring to find application issues. Some may go away as a result of fixing infrastructure issues. Others will require support calls to vendors.
I walked into a similar nightmare two years ago. Before I even took the job I assessed the situation and gave them a proposal for what needed to be done and a price estimate for the software and hardware. I told them I would not take the job unless they committed funds to support the function. I also warned them that there were numerous ticking time bombs and I'll defuse them as fast as possible but there was no magic fix and it would take some time and they could have a disaster still
I then convinced them to only hire me part-time and to also hire a part-time desktop support person for a few reasons including they don't want to pay me to do that and having two IT people at least gives you some continuity. Even if the desktop support guy doesn't know the high-end stuff, if I leave the desktop person can still guide the new person and save them a lot of time I never got.
My line of attack was:
Getting back to original point, a one-person IT shop is suicide. Them having a two person part-time crew is better because if one leaves, at least the other can provide some sort of continuity -- and that happened already. The fairly young guy I hired for desktop support two years ago died last month :-(
its simple, you cant. you can however make a resounding case to your employers that you will need more help. learn more about the business, how it works, and interrelate your infrastructure to their bottom line in order to secure extra funding and more hands. management is tasked to ensure you as an engineer have everything you need to do the job, and if your job scope has grown then so to must your resources.
do not try to handle the entirety of the infrastructure on your own; help desk, development, and sysadmin. I pulled that plate-balancing kind of act for the first three years of my career and it amounted to a thanklessly low paying job, a long commute, and an unrelenting amount of stress. if there are bandaids placed everywhere then its because the last guy couldnt communicate the things he really needed (servers, cooling, switches, a real lunch break with actual hot food.)
Good people go to bed earlier.
Talk to your boss ASAP and highlight where the issues are and explain to him in monetary terms what will happen if the system screws up / how much time will be lost.
Push to see if you can afford to get someone else hired - even if it's a junior network engineer. You need to share the pain before it consumes you.
You may want to consider hiring an outside firm to come in and do the audit for you. The last thing you need right now, on top of your daily workload, is to perform an audit. That, and a third party firm creates a sense of objectivity, and would eliminate the "The IT guy wants a new toy" response from the CFO.
And make sure not to hire an outside firm that consults on outsourcing IT support. Security firms are pretty good at general IT auditing in addition to strictly security related analysis.
"Science flies us to the moon. Religion flies us into buildings." - Victor Stenger
The first step is to define your goals. What do you want out of this?
1. a job
2. learning new skils
3. leadership
4. a chance to grow in the company
If you are the sole IT/programmer person, this is a company in dire need of management with clue as to IT. You could be that change and end up being a manager of IT for this company. You have to work you butt off, fixing things, dealing with budgets and hiring staff. Can you deal with upper management to accomplish everything? That's up to you to decide.
What I won't recommend is killing yourself for a company that is unwilling to learn from its mistakes and do it right. In that case, just treat it as a good learning opportunity, but don't kill yourself. They won't always be able to hire a superhero to come in and keep things running. Or if they do, it will be a well-paid consultant and they will learn their lesson quickly how much it costs.
There is a reason this company has such poor IT systems. You could up being the IT guy in a long line of IT idiots.
It sounds like you're already fucked.
Your the sole programmer and IT guy for a "thriving" company? That's not thriving. That's life support. Find another company and let nature take it's course with these guys.
99% of the time a hosed IT infrastructure is because management refused to spend any money so it had to be half assed.
It is certainly true that a great many companies are penny-wise-pound-foolish when it comes to IT but it is VERY premature to jump to that conclusion here. I've seen almost as many cases where companies over spent on IT for things they didn't really need. My current company has a piece of accounting software that is seriously overkill for our relatively pedestrian needs. Cost our company $80,000 when $3000 on Quickbooks Enterprise would have done the job fine. ( Bought by the previous owners who were all engineers without a lick of business savvy)
In any case it is much more likely that any "half assed" solutions were due to a lack of competence rather than a lack of money. It sounds like this guy has done a lot to improve things without throwing big bucks at the problem so I'm inclined to suspect his predecessor was not especially gifted.
Money whipping a problem should always be the solution of last resort. While it is certainly possible this company isn't spending enough, you don't spend money on anything without a reasonable expected ROI. Spending money as a first impulse usually means you haven't really thought about the problem sufficiently and are just assuming that a more expensive product will solve all your problems. If I hired an IT guy and the first thing out of his mouth was that I wasn't spending enough I'd be seriously worried.
Agreed. And talk to the outside auditor to make sure they strongly recommend hiring at least one other IT person.
What happens when you're out sick?
What happens when you want to take a vacation?
What happens when the servers die at 4am?
What happens when Hotmail refuses to accept connections from your company, and then Google Analytics explodes, and then your merchant account service stops processing your transactions, and then the marketing DB goes down, and then the phone systems stop working?
You want AT LEAST one other person helping with your job.
Put my fist through my alarm clock with its ding-dong death inside my ear. - The Blackjacks.
Slowly and carefully... You'll probably never get done and in the end you'll be applying similar band-aids to get things running/keep things going as your predecessor did. Then you'll leave and someone else will come in and the cycle will start all over.
Good luck and god speed, sir.
We are utterly puny and irrelevant. About a million in annual sales. The upside is our sales are all add on sales, so we are almost all profit, except for IT and fulfillment costs. It takes 5 of us to keep this thing running, that is our entire staff. This guy is smaller than that. It is hard for me to imagine a company smaller than our operation. Like the only thing smaller than us is the couple that buys stuff at garage sales and puts it on ebay.
If this guy is for real, I suggest taking all of the servers and crap and throwing them away, and get a nice e-commerce storefront from a reputable dealer. For something this small, if the LAN is something more than... A wireless AP router, with a cheap wifi card slapped into each of the computers, he is spending too much time and effort on it.
Ditch all that pearl crap, and go buy yourself a few copies of Excell. That is all the database you need at this point.
This unit is too small for even a key system for phones. He should consider just getting everyone a set of headphones, and have everybody do this on skype. Or just a couple of POTS multi line phones.
But wtf??? He shouldn't be doing anything with Perl and databases until they are generating a few thousand records a day. I mean in our company, with about 30-40 orders a day we fit very nicely in Excell...
Hire me to come in and help you out!
3 years ago I was dropped into a very similar position at a small state agency.
I had some documentation, a server names list, passwords, and licenses lists. I first addressed the issues with desktops, and shored up the servers. What probably saved my sanity was I was tasked with moving most of our servers to a VM environment, and could setup things in a documented manner.
It is better to be the hammer than the anvil.
The number one best thing you can ever do in your situation is ask your bosses what they think the system should be doing.
Step 1: All the squirrelly business logic and the rationale behind each system you have to maintain should have a plain text description. You have to know the 'Why' before the mess of band aids that is the 'How' will ever make sense. Have your boss (or his secretary, or whoever) document it and get it to you. Do NOT do this step yourself. Repeat do NOT perform this step.
Step 2: Put out fires till someone not you finishes step 1. Start making backups of every last scrap of data you can get your grubby hands on.
Step 3: Once step 1 is done compare it to the mess. Note where the realities that are in your bosses head diverge from what is actually happening. Your job is to now create a detailed functional spec that takes what your boss says, and expand on it with what is really happening. Try to include worst case scenarios and document them as intended features.
Step 4: Have your boss and sales and marketing, and every other top level manager sign off on it. This will not happen. No two managers in your company will fully agree on what the current system is actually doing. Your goal is to figure out what sales and marketing are telling your users that your products do. Do not disregard this step or it will come back and bite you very hard.
Step 5: Once every department actually agrees on what your job really is, you will be well equipped to start the long process of fixing things. Again make lots and lots of backups. Management will sign off on step 4, then you'll fix a gaping security hole, and some customer somewhere will throw a raging fit because sales promised that they'd be able to get admin access to your databases or something ridiculous.
Step 6: Don't be an ass. When step 5 inevitably happens, explain the miss-step in communication graciously, and roll back. If you pulled not being an ass off properly, you now have a great platform to explain to management why X was a bad idea, and present an idea to fix it.
I'm a grizzled vet to your situation. If someone would've told me what I just told you when I started out, there would have been a lot less headache and stress. Hang in there, it can be an intensely rewarding experience.
Obviously they hired the wrong replacement if you are asking these questions. This is IT 101.
Set a goal on how all should be working and evolve each area toward that goal, one by one. Be sure that the old sections keep working with the new ones, keeping mostly the old code till their turn comes. Will be an iterative process as the definition of the goal architecture probably will evolve, both by future needs and to do less work accomodating old things that are good enough.
I think you missed the part where he said the company was thriving.
It means whatever he does is irrelevant, because they are thriving with whatever they have. I think he needs to check if they are really an e-commerce company and not a money laundering operation for some drug dealers, in which case he is set for life.
You can't handle the truth.
I've been through similar situations a number of times. For the people who are telling you to get out of this job, I say: not necessarily. If you manage to fix these things, it can be a great learning experience and it can help you earn a name for yourself.
So my advice is to start out bringing these problems to the attention of management. You don't need to be pushy, but be very clear that you have found these problems, that you think they're serious problems, and that the problems may endanger the success of the company. Give them a little leeway on how to direct you. They probably won't want to throw lots of money at the problem, but if they don't seem genuinely concerned and looking for solutions, then start looking for a new job.
Second, get ready to learn about project management, because you're not fixing all of this at once. Make a list of what needs to be done. Prioritize that list. Estimate the time needed to do each task. If there's something extremely high priority that will run up against a specific deadline, then figure out what's necessary to meet that deadline. Start working on a budget.
Start setting schedules for each thing that needs to be done, but recognize that the schedule will have to be flexible. In fact, don't bother scheduling things that are low priority until you've put out some fires. Keep them on your todo list, but consider making a separate "to do eventually, but I'm not going to bother thinking about it right now" list. When you have a schedule set, get to work. Keep track of your progress, and keep management informed of your progress. Keep them informed about problems and obstacles that you encounter along the way, especially if they'll cause an increase in your budget or a delay in your schedule.
You'll want to gather some good project management tools along the way. At a bare minimum, these tools will include a calendar, a todo list, and a way to keep organized notes. Set aside time every week to review your notes, your calendar, your todo list.
You can take project management classes, but most of what they teach you comes down to this: Make sure you understand what you're trying to accomplish, and that what you're doing is actually the best way to accomplish it. Keep your stakeholders informed, and listen to their feedback on your progress.
And make sure not to hire an outside firm that consults on outsourcing IT support. Security firms are pretty good at general IT auditing in addition to strictly security related analysis.
Right, and be careful using a Vendor to run this audit for you as well. It might be tempting, because they will give you a really good deal, but you are essentially paying them to generate a report that says you need to buy all their gear. Not saying this can't work, in fact this could be a good option if you are on a shoestring (assuming not with your use of the word thriving) budget, just be careful.
No man is an island, But if you take a bunch of dead guys and tie them together, they make a pretty good raft.
Thriving is relative. If they are a 1/2 million dollar company and their sales are up by 100%, then they are thriving.
And yes, you can survive on a single IT staff if your web presence is your business. Several years ago I consulted for several companies like this.
In this day an age you can just virtualize it all and throw the crap away.
posted to un-do faulty mod, thanks to a browser quirk or slashdot ajax or or something.
You document what's there. You've already started that. Next you document what's deficient. Then you put together a plan that, in stages, makes things better. Then you propose that plan to your management in terms that make sense to business people (happier customers, money saved, disaster avoided, etc...). Then you execute the plan.
"I'd rather be a lightning rod than a seismometer." -Ken Kesey
Your general problem isn't unique, and the solution isn't either. I wouldn't want to address this large problem using the contents of a post response on /. . Find a proven method for fixing this sort of situation that you can rapidly read and follow. Rather than writing out the various steps, grab a copy of the Visible Ops e-book (or some similar one), read it in a few hours (its short), and start addressing your environment chapter by chapter. With a documented plan of attack that has been used by many, getting management support should be trivial and you can hit the ground running fast.
http://www.itpi.org/?page=Visible_Ops
Either because your predecessor 'made it work' with little or no funding (better translated as 'he made it almost work'), or because your predecessor failed to acquire sufficient funding to do it 'right'.
As a former field tech/consultant, try to avoid bringing in consultants to explain why the stuff needs to be bought. Many a manager ends up believing the consultants and disbelieving their staff. You get to either hire the consultant to justify your plan or find yourself undercut by that lack of confidence.
And of course nail some problems and show improvements as early as you can. It's wise to both solve pressing impactful problems first, and gain trust.
I always loved going into a client with lots of problems. Not just for the thrill of making things right, but knowing that if I did it right I had a referral for my next client - because the end result was most often working my way out of the gig. Either I passed the client on to another tech to maintain, or they got their staff's legs under them and could carry on. So long as there are more clients, this is good. Great fun to figure things out, isn;t it?
deleting the extra space after periods so i can stay relevant, yeah.
I was hired almost 3 years ago to replace my predecessor who died. There was little to no documentation. What little there was was wrong or very out of date. As she was sick for the year before she died she wasn't in the office much.
I started with the server farm. I documented the hardware and software on each server. I exported the Active Directory and did a permissions audit. I retrieved the windows license keys from the servers using Magical Jelly Bean Keyfinder which found the keys for a few other programs we were running too. I then did a windows backup of each server AND verified the backup. I then proceeded to update all of the software starting with anti-virus and then MS patches. I had to call the other software vendors and explain my predecessor died and I needed help to find out what we owned, keys for the software, maintenance records, etc. Most vendors were happy to help. I updated what I could. I filed the rest away along with quotes for the current versions of the software that didn't have a maintenance contract.
I then went to the network. I traced each cable back to where it was plugged into and used dia (http://live.gnome.org/Dia) to document the network. I even took pictures and label a few cables. This was a real mess. Each VoIP phone was plugged into the main network and each computer was plugged into the phone. That was the reason their computers and phones weren't working properly. I fired up a wireshark session on my backtrack pc and logged the traffic. The network was in horrible shape. All of the printer servers were running IPX, DLC, appletalk, etc, even though it was a strictly windows shop. I also found a few computers trying to send out email when no one was here. Can you say virus? A reinstall of the OS and Office fixed that problem. I had to use the recovery method of each of the networking equipment as the passwords I was given did not work. I did firmware upgrades on everything and then took a can of compressed air and shot it through the vents. I had a vacuum cleaner on the other end sucking up the dust that came out. Do the cleaning when the equipment is turned off.
I used spiceworks to document the pcs. None of the computers had the windows firewall turned on so it was a fast and easy solution. I found a lot of software that had no business being here. I setup a WSUS server to patch the clients.
I had a meeting with management about 6 months after I started and explained everything in detail. I made a priority list of items to address. They agreed and started to move to address the issues. The anti-virus contract was first as it was the cheapest. The MS agreement was almost last as it was expensive but is well worth the cost.
I did break a few things while I fixing the servers and network. I had to write an asp frontend to an access database so I could get rid of a dieing server. You are going to find things no one else is going to know about or thought was removed long ago. It took me 5 months off and on to find a switch that spiceworks said existed but I couldn't find. I found it above a ceiling tile when I was running a new cat 5 cable.
Just document everything, backup everything, and work after hours. It has taken almost 2 years to address everything but everything is running much better and the people are much happier now that things work.
Good Luck
Inheriting someone elses work is never easy, but your doing well enough. You have to resist the urge to re-implement everything wholesale and simply take the time to learn how the hell everything is working.
In your case I would suggest documenting the discovered issues and noting exactly what your concerns are. Also throw in some time estimates (with a healthy error margin) describing how long it would take you to fix such a problem if it manifested, and estimates describing how much time and effort it will take to replace and re-implement the systems that would cause the problems. Then send this stuff out as an e-mail no one will ever read.
Then when shit breaks, you at least have the paper trail and you can give the 'I tried to warn you' speech to preempt your bosses from having you eat the big bowl of dogshit that will result.
END COMMUNICATION
"programmer and sole IT personnel at a thriving e-commerce company."
It's not a thriving e-commerce company if they have a sole IT person who is also a programmer. They are either progressing steadily to failure or are a "successful start up currently experiencing growing pains" at best. If they have that many systems managed by a single person and have that many band aides... then without the support of upper management all you can do is try and make better band aides. Be sure to voice your concerns so when the flaming pile of hardware hits the fan it isn't your fault.
There is a fine line between awesome learning opportunity and being taken advantage of.
Maybe it's just me, but a question like this is from a person in over his or her head?
How is it that a thriving e-commerce company only has one IT employee? Did the rest quit? Did the marketing drones and PHB's bleed the IT budget into their bonus pool? I suspect deeper issues than the physical infrastructure you describe. Bandaids are what someone does when there's no budget to do it right and it sounds like the management culture doesn't want to invest in doing it right.
Brief your management on the situation. Explain what condition things are in and what is needed to get them into a manageable state. Give them a list of projects / tasks that you have to deal with and get them to prioritize.
And If you feel at all uncomfortable or hesitant to do this, then examine alternative career paths beforehand.
Once you have briefed them, If management appears reluctant, begin pursuing those alternative career paths.
You can plan to move to Cloud service if there is any alternative so that you can scale the system very easily. that is going to be the future..
Make an asssement, then get audit done and submit the report to management.
There are lot's of things you could and should do but automation is the only thing that will allow you to do more over time. Start there and build on it.
Do that in broad lines, avoiding too much detail. Start on the largest scale, with short descriptions of all the major subsystems, and then work down to the lower levels. In the same way, document your own changes, but don't skimp on describing your rationale.
It may take a while, but this approach has a number of advantages. First, you will develop a clearer picture of what your predecessor has done. Second, you will better understand your own handy-work when you try to figure out why you did what you did months or years from now. Third, as opposed to your predecessor, when the day comes you will be able to leave your position in a clear conscience to your own successor.
To many, this advice probably sounds like a good way to make a tough job even harder. Most of us hate writing documentation (just ask your predecessor), but system administration is definitely a lot more complex these days than it was in the 90s, and even back then I learned the hard way that, in order to remain in control of systems that will likely be used for many years, documentation is essential beyond a certain level of complexity.
"A little over a month ago, I assumed the position of programmer and sole IT personnel at a thriving e-commerce company."
I'm sorry, repeat that again?
"A little over a month ago, I assumed the position of programmer and sole IT personnel at a thriving e-commerce company"
No offense, but you should leave NOW! A "thriving e-commerce company" with only one person in IT? Really? You are working for a Mickey Mouse operation. It's most likely run by sales, and sales doesn't give a damn what you do or how you do it or the problems caused for you. Those band aids? Sales requested each and every one of them! Leave now. Leave fast!
I8-D
I'm going to say this, as someone who has been in similar situations. You need to research the issues, come up with a plan that can be prioritized and/or implemented in stages of increasing cost/time/difficulty.
At the top of this list should be the easy, obvious fixes that yield immediate benefits. Things like simple backups, and configuration tweaks to improve resource utilization. Things that are free to implement (besides your time).
Below those are cheap things that will help in the short-term, or address a current need. This means independent services that don't require lots of planning or testing to implement. Mail server's running out of space ? Upgrade or even move your mail to the cloud if it makes sense for this company.
Then at the end of the list, you have the long-term stuff and the "nice to have" stuff when the company can afford it.
The idea is to present it all in a way that follows the growth of the company. Boil that frog slowly! You need to gradually show the benefits of proper infrastructure, in a way that won't break the bank nor shock the manager or owner who invariably has been skating by in blissful ignorance. Those early successes will build your reputation and trust, so that when the time comes to make a big purchase, or bring in some contractors, the value will be self-evident to your superiors and you will have a much easier time getting that PO approved.
that no one even mentionned Virtualisation yet.
Go get Spiceworks, and it will do 99% of your requests for you.
You're welcome.
I8-D
The time that you've been given to familiarize yourself with their environment is gone. You have found a tonne of inefficiencies and know exactly what you need to do to clean, automate, improve etc...
BUT
Now you have to start doing the work that 'the business', the part of the company that makes money, you find yourself taking short-cuts and applying the same types of band-aids. It is the curse of all IT professionals!
"scholars never agree and fools seldom differ"
They have a guy who finds upgrading phone systems immensely satisfying! If he's sick he'll come in and fix it and who needs vacation anyway, he'll take the cash instead.
I'm betting it's a psychotic break and he IS his predecessor.
Deleted
I've been in your situation a couple times before.
First things first, Organizational support.
You say you are just one guy. You need buy in from your boss/organization. To get more resources, or at least buy in. One thing I learned, if you don't have support of the organization, they'll NOT recognize a single thing you fix... but will be quick to blame you if something remotely goes wrong during a fix.
If you don't have organizational support, then it's time to find a new organization.
Prioritize. This seems simple, but when things look insurmountable, you'll be surprised how easy it is to get sidetracked. Top of the list are issues that have the highest likelyhood of affecting production uptime. Regardless how bad an issue is, if it doesn't effect production uptime, and is NOT a 5 minute fix... it goes to the bottom of the list.
Consultants. This can be a double-edged sword. They're good if the area is outside your expertise, (ie you're the Unix/Network Admin, and there's glaring issues with Weblogic), If you do go the consultant route, define a CLEAR STATEMENT OF WORK, and ensure it includes DOCUMENTATION. :-)
I think everyone has inherited this same mess at some point. The answer is always the same as "How does one eat an elephant? One bite at a time". Identify the biggest and worst fire - correct it, then move on to the second worst fire. Eventually you end up in IT nirvana.
You will probably be getting a large number of suggestions. I have done both support and development on mainframes and servers so here is some input:
1. Let management know at a high level the state of the machine(s) and get permission to spend part of your time documenting the system. When you get permission ask them for how often they need updates and how much detail. Keeping them in the loop seems to make them happy and feel important.
2. Document the current state and highlight areas of concern. Put down what the concerns are, the risks and the potential costs to the company if it fails.
3. Go through the document and organize it by risks. Try to figure out the size of the risk and how much work it will take to fix it and what is needed to fix the problem.
4. Automate as much of your process as possible. Any task you have to do on a regular basis (in my humble opinion if you do it more than once then automate it) should be automated. Dedicate time to document what you did.
5. Senior management is probably not wanting to see details. When you present, keep it simple and short. Point out the costs of failure and if you need software to help put that forward as an 'investment in infrastructure'.
6. If the company has an internal auditor make friends with him/her. Getting them on your side to present to management will help. Having the auditor explain to them the financial costs will help your cause a lot.
7. When you do things take the time to document what you are doing, WHY you are doing it, how you did it and where to go for the programs/scripts/data.
8. Pick the brains as much as possible of all the people there. Offering to buy coffee and donuts seems to make them more receptive to an informal
session and the amount of information they have could help you.
Part of every project we do now is dedicated to documentation and the client now knows the importance of that documentation and is happy to pay for it. The current system is over 25 years old and a lot of business knowledge has been lost due to people retiring or leaving. When we find things we put them into a document. The hardest thing to find is the 'WHY', but, once you get that the rest of the information starts to make more sense. Our most popular section is the 'HOW TO DO' as this is the short cut for every other document in the system.
When you do your documentation try to keep the documents as open as possible. Try to avoid proprietary packages as much as possible. We had an old flow chart program that we didn't have the program for and it took me a week to find an open source package that could read and export the files.
Panic now, beat the rush!
Word for word, I thought your submission was something I wrote. I've been there, done that, so it is possible. The mess I inherited took me 6 MONTHS to fully comprehend and formulate all plans of attack. Meanwhile, I was trying to fix and maintain what I could.
To tackle the whole problem, where did I start? Everywhere, frankly. There was no one starting point. Everything was affecting everything else. I did my best to get written down exactly what was running on each server, including running daemons, system crontab, and all user crontabs. Basically, build a list of what exactly you can determine. Then, try to get into the mindset of your predecessor and try to understand why (as best you can) to see if that sheds some light on what else may be lurking out there.
I didn't get bogged down in formal documentation because I knew I would have to rebuild everything. I took copious notes and drew pictures of what I thought were the processes of the system, but this documentation was just for my own benefit. Once I could identify unneeded processes, I shut them down or side-stepped them and hoped I didn't break anything; if I did, that was just more information at my disposal. Other processes I simplified and consolidated where I could. Eventually, the system got more and more comprehensible.
Once I "got" everything that was going on, I built a fresh new setup that focused on simplicity and efficiency. This is when I focused on documentation. I kept the legacy system dormant but available, just in case.
Now, we have a darn good system in place. I used virtualization to segregate different services and to enhance security. Maintenance is now preventative. Management is happy because they can now grow the business without everything imploding. The focus of my work today is building new products and services. I hope my successor doesn't complain.
Post all your passwords, server names and address. The Slashdot community will swoop in and document the whole system for you in a few hours.
In answer to your question - Slowly, with extensive notes and with discussions with the people that the systems you are looking into may affect. I've been the primary It person for a large printing firm for about 16 years, and I manage a staff of two now, with various personnel changes over the years. Applications that look like a mess may have developed over a period, and as the person you replaced developed their skills and learned how to do things better. There have been several recent articles in some of the tech blogs about finding out why something was done a specific way the hard way, and learning to document everything before you make even simple changes is the safest course of action, and will get you back out of trouble. Don't be afraid to talk to the people that use specific systems or functions; sometimes you may find it is "because that is how we have always done it", but sometimes you will find that a specific practive avoids a common type of error or provides a specific check on a process. Remember that what you have come into may look like a tangle, but it is also a working whole, even if some parts seem to be of the baling wire and bubble gum variety. Look at the whole when you replace a component, and see how the information and process flows both into and out of the portion you are improving, and be sure your new solutions integrate as well. Trust that you will make mistakes, and make backups, copies and notes that enable you to back out carefully, then when you have made successful changes, document the process you have completed and keep that as the core of your new documentation. Make one goal to never put the person that follows you in the same bewildering place. There is never time to go back and document, especially if you are on your own. Do it at the time you make the changes, and you will have it when you are finished, as well as have the path to back up and recognize where your change may have caused something to go wrong, as some things inevitably will. Above all, prioritize, document, research and change things slowly. Your primary responsibility is not to improve the systems, it is to support the people that rely on them. Keep that foremost in mind as you make any changes and you'll do well.
Get a whiteboard. Put your task list on it, in priority order, with time estimates. Order should be based on a business decision - what's the financial risk of something failing. Backups and security are always pretty high on my list.
Get buy-in from management on the ordering, because when something breaks (and it will), you need to make sure that someone above you approved the risk ordering.
Once you have a priority order, then figure out how much it's going to cost to do each one. If mgmt considers something a #1 priorty and is only willing to fund 10% of the price to fix it, then you have a pretty clear warning that it's time to look for a new job.
When tasks are finished, cross them off but don't erase. Make sure everyone knows that things are getting done.
Don't let anyone rearrange the task ordering without a financial justification that's approved by mgmt.
...is why your company is in this mess right now. Was your predecessor incompetent, bad at documenting, or so busy putting out fires that he never had a chance to do things properly? You also need to know if the reason he was putting out fires all the time was because he never had the resources to run a proper IT department. If that's the case, you have to prepare yourself for the real possibility that management either doesn't understand the need to properly fund IT or the fact that they do understand and are just too cheap to do it. A lack of understanding is fixable; being a bunch of cheapskates often isn't. And if they're too cheap, especially considering that you work for an e-commerce company, then you're in for a rough ride. Believe me, it's no fun trying to do your job when you can't get the tools you need to actually do it. If you find yourself in that situation, then you can only do what you can do. After that, you're going to be frustrated as hell. Perhaps that's why the last guy left.
And one word of warning. You're going to have to make an assessment of how likely it is that you can keep this ship afloat. If you're unable to get things in order, and the best you can do is stay one step ahead of a disaster, then you need to get the hell out before the inevitable disaster does happen because, when it does, it's entirely possible that you're going to get blamed, especially if management is too cheap to get you the resources you need. Fix the leaky boat as best you can, do whatever improvements you can manage, then polish your resume and start looking.
I did this sort of thing for a year and a half. Management was on board, the staff were on board, everything was going swimmingly. Then at some point I realized that the goal of getting things "cleaned up" moved every time I made progress. It turned out Management was clueless and they were definitely on board, just not the right train. I was doing the job of a sysadmin/developer/dba/manger, but not getting paid for any of those roles. I got angry and burned out. So burned out that I considered quitting IT entirely.
Was it an incredible learning experience? You bet.
Was it career building? Not really, because I wasn't working with the latest tech due to budget constraints and constant firefighting.
I now work for a similar sized organization with a small team. There is still lots of variety and a pile of work, but the pile is manageable. I am infinitely happier.
My advice repeats what was stated above: Figure out why the place is a mess. If it's management's fault, run and don't look back.
Gordian knot. Alexandrian solution.
Help stamp out iliturcy.
When I was hired to run the IT department of a major company my predecessor left three letters in the desk that was now mine. Each letter was clearly labeled; System Failure #1, System Failure #2, System Failure #3. A post-it note was attached to the bundle of letters.
In case of a substantial system failure open the letters in order, once per failure, and they will help you through the problem.
I put the letters back in the desk and forgot about them.
About one year later we had a cascading server failure that left our corporate intranet and several important production servers off-line. While repairing the problem I remembered the letters. Curious, I opened the first letter.
Blame me, your predecessor
The day after we got the servers back up I was called in to my boss;s office to explain what happened and why were down for so long. Taking my cue from the letter I blamed my predecessor. My boss was satisfied with my answer and let me go.
About six months down the road we had another big failure. This time our primary database server went down and the secondary was having trouble dealing with the load. I had to put a lot of extra hours into getting them back up and we lost a few transactions due to the backup server not being able to function under the load.
Once again, I reached into that desk drawer and opened letter #2.
Blame the equipment
This time I lamented to the boss about how it wasn't my fault. It was that backup server! If we had some good equipment to run on these things just would not happen. He was satisfied with my answer and I went back to work.
Things ran smoothly for the next 18 months. Then we got hit with a virus that somehow got past our firewall and wreaked havoc on our systems.
I opened the third letter.
Write three letters
(Sorry, this was the first thing I thought of when I read the summary)
Is it just my observation, or are there way too many stupid people in the world?
This was meant to be posted in response to something else above.
I don't know how it got pushed down in the comment chain. Sorry for any confusion.
The combo of Observium (network monitoring), Hobbit (monitor everything with extreme ease), and either ESXi or Proxmox VE for consolidation and ease of management/isolation/testing/etc has served me well for years to take control of large organizations quickly. Last two business I was hired to fix, I set this up and then built a parallel enterprise as VMs (the right way this time) and then cut everyone over in a weekend. No one noticed the change except to say stuff didn;t crash anymore and it was really fast.
Also OpenFiler and NexentaStor make for a great SAN.
If you need more: PFSense for firewall or VLAN router, BlueIris for IP cameras, PBX in a Flash for VoIP, SoGo for Outlook compatible email, LibreOffice, etc.
[RIAA] says its concern is artists. That's true, in just the sense that a cattle rancher is concerned about its cattle.
I'm looking forward to reading the full story on The Daily WTF in a few months.
I started as the lone IT administrator for a large commercial insurance company. With almost no documentation, I had to use some RMM tools (Kaseya) and other audit software like Belarc Advisor, and network sniffers to completely figure out everything going on in the company. There were so many terrible band aids, I even had to bring on a few helping hands to help get it done, it took a year. They have 64-bit/32-bit compatibility issues, they had Windows 7 on machines that were trying to run a DOS/Novell shared resource. It was extremely difficult and frustrating, and the pay wasn't as good as easier positions I've been in. However, it was an amazing day when it got completely done.
You already know that it's a tangled mess. You need to map that tangle throughly before you start fixing/replacing/retiring anything. The conversation you do not want to have with your superiors is why retiring system X (which costs $5,000/month) took down system Y (which makes $100,000/month). You need to map out both the business processes (which systems they touch) and the system dependencies (trust no one, log network data and look at the traffic between boxes). Do not start pulling strings until you know what they're connected to.
You're not going to do this by yourself... at the very least you're going to need someone who knows the business side throughly. I've walked into a situation like this before for a very, very large company and I swear it took years off of my life but I learned a whole hell of a lot from the experience. Best of luck.
Have just done something similar. But did have some documentation, even if some of it where outdated/invalid or just plain wrong...
Started about 13 months ago and still not finished, but it's going forward..
How i did it was:
1. Change admin-passwords on all systems and allow back users on a per-needed basis.. (tell then if they want the admin password they will have to maintain the machine... scares most people off)
2. Inventory of the current services needed (ignore the current systems at this point)
3. Inventory of what machines can be scrapped and re-purposed.
4. Inventory of the services running of different systems.
5. Draw up a plan on how the network looked and what features where needed.
6. Looked into what things that where needed in the next 3 years.
To get to this point for me took something like 4 months, but the old setup was horrific... plain gray-boxes.. no mirroring of data..... No recovery-procedure except re-installation.... netgear/belkin/linksys routers as firewalls, those for home use... Yes, multiple ones.... so basically someone going into the first store and buying as they would when they wanted something for home...
Next steps...
Design how the environment will have to look to support new functions for the next 3 years.
To try and limit the amount of work here try and reuse existing systems, if they are well-maintained and in good working order. If it's a bit job to migrate any specific , badly configured, migrate it to the new enviroment if it can be kept seperatly and without having other systems to rely on it... Keep it running during the big job and take care of it later...
Remember to make backups of everything before you touch anything... Preferably try and restore them to a new system and verify that the backup works...
And here comes the hard work... Create a migration-schedule and start migrating...
I would strongly recommend you to hire someone for doing an audit after you have some type of documentation of the systems to help you come up with a good system-design that can be maintained in the future too... Try and keep stuff as automated as possible and try to use one type of OS to reduce the complexity, but exceptions are ok..
And... Document systems that you install!!! Do NOT move to the next system before the current one has been documented completely!!
to not be so cheap and hire a bit more help. If they are "thriving" they can afford it.
Been there before. Do this:
1) Audit first! e.g. www.fastslm.com
2) Identify non-conformoties
3) Fix them
4) Track everything
5) Define IT rules
6) Enforce them
I just celebrated my 6th anniversary at a non-profit as the sole IT guy. I still have a long way to go in terms of fixing or replacing the trail of crap left by my predecessors, but things are going pretty well. I manage around 85 devices (servers both physical and virtual, desktops, laptops, security cameras, thin clients) using Spiceworks, document like a fiend, backup my backups (we have about 60TB of data), etc. My next big "project" is to migrate our customer service database from a 15+ year old Access mdb to SQL and whip up a web UI for it. I'm shooting for next summer to start that.
Good luck!
I have been in IT for over 25 years and have worked dozen's of IT jobs. The average time at a company for an IT professional is 3 years. That being said, this problem the guy is experiencing is pretty much the norm for every job I have ever worked. When I am hired, it is to clean up the mess that is IT and get things documented. IT guys are notoriously horrible at documenting anything. I come in, document everything, reverse engineer it all, and usually redo the entire organization from the ground up to get it set up right. Once I am done and it is all stable and documented, it is time to move on to the next disaster. I think a lot of IT pro's are not as good as they pretend to be. To be in IT, you need to know more than how to fix a server, plug in an ethernet cable, etc. You need to know how to document and keep documentation up to date. If you are a sole IT guy in a company, the problem comes from management. I can't tell you how many times I see the "IT Manager" or "VP of IT" that has no technical experience and is trying to run the IT department. He ends up hiring some young kid that is an "IT superstar" but that actually ends up being a kid that just messes with linux in his spare time and not an experienced IT pro.
But as I dig deeper, I notice the alarming number of band-aids applied by my predecessor, and it seems like the entire company's infrastructure is just a few problems away from a total meltdown.
The day I walk into an office and this isn't the case is the day I'll be worried about my job as a break-fix IT contractor.
for the operation.
Has he been given the authority to make it right?
Or is he just the next scapegoat to take the fall for the PHB?
Write up a report documenting the faults you have fixed and show how that led you to notice the other faults/bottle-necks/bandaids/single points of failure.
Figure out a rough estimate of the equipment and man hours necessary, include a time frame to complete.
Include a department allowance to bring in a right hand man to cover day to day crisis to allow you to focus on the upgrade.
And submit this up the food chain.
I am sure that these issues were not disclosed during your hiring process (if there was any awareness). These issues seem very critical and if you go trying to fix them without documenting the severity and notifying the higher ups of the hazards, you may wind up getting blamed for any crisis arising from said defects.
Rick B.
1. Work out what bits you want to work on and what you don't. (I would be most interested in their database, backend payment, processing, and accounting systems, and customer facing website, YMMV) 2. Log everything that you work on so you have evidence of where the problems are and how long you spend on each one. 3. Get quotes from separate firms to manage all the bits you don't want to do (printers? phones? LAN? server management? email?) 4. Find out what the directors' plans for the business are for the next 6 months, 1 year, 2 years, 5 years. 5. Work up a systems development plan to meet the director's business objectives (increasing customer base? new products and services? increasing transaction volumes?). 6. Present the package of quotes and your development plan in an organised meeting with senior management. Argue for the extra money to cover the routine services, and the extra staff you will need to support their business development (as others have said, doing it alone won't work long-term, it will lkill both you and the business). If they are a serious business they will recognise the wisdom in your approach and be more than willing to invest in what is ultimately the basis of their whole business. Do this quickly while you are still an unknown quantity, full of magical skills. Once you become the guy who fixes the printers they won't be able to hear your message so clearly.
Korma: Good
I would also say that you should ensure that everyone who can negatively affect your career should be well aware of the issues you face and what you're doing (or going to do) to fix them. You don't want something you haven't fixed yet to fail and have that blame fall on you. Cover your ass, well and repeatedly.
You hire an expert to do it.
The Kruger Dunning explains most post on
"a thriving e-commerce company" with one IT person.
Step one: Fire all non-IT personnel and replace with smart IT people.
No further action required.
You wouldn't be in this situation if your employer gave a crap. It's plain and simple: you report to someone. They know the extent of the problem and that there is only one of you. If they cared, there would be more than one of you. But there isn't. So turnabout is fair play.
This is the true American solution to your problem: find other people to exploit and skim off the top ...
Step 1: tell them you're going to become a telecommuter so that you can work 100% of the time
Step 2: get on elance or some other such site: hire gobs of cheap (dubious) overseas help at $1/hour
Step 3: instruct them all to send emails from your address and answer the phone with your name.
Step 4: find a different job and just let your sub-contractors handle that one until the house of cards falls apart
If your current employer calls you out on the fact that you have 15 different accents and sometimes answer the phone in a female voice, ask them why they're so racist.
bonus if you used a pseudonym when hiring for your present job.
Start building a list of problems or ToDos and keep everyone in the loop every time a new item is added.
Ask management to prioritize the list every time a new item is added.
I've had cases in the past where management actively refused to deal with the problem list.
They forbid me to keep a list as "documenting problems took time away from solving problems"
If you have this situation, submit your resignation to all management (including executives)
along with the problem list. Make it clear that you are resigning due to management refusing
to acknowledge and deal with the existing problems endemic in the IT infrastructure.
Perhaps middle management was trying the keep upper management from finding out about the
problems. Now everyone knows.
Here's what they will do:
1. They might call you back at a better rate. You win.
2. They might call the guy who caused the problems in the first place -- at a better rate. (they're idiots)
3. They might call a new guy in at the same rate they hired you for. In this case, they need a slave.
The first and the second slaves quit so they're looking for a third slave.
When they get to the 5th slave it will dawn on them that they need to start looking/paying
for talent not slaves.
Just remember, if something goes wrong, blame the guy who can't speak English...
Ah, Tibor, how many times have you saved my butt?
Don't blame me, I voted for Cthulhu.
Sounds to me like OP has no way to tell the wheat from the chaff - what boxes do what, what's essential, what's critical, and what's actually dead weight.
Inheriting an environment places a certain requirement on the incoming admin if documentation isn't there or has rotted.
Firewall config backups? Cool - where are they, and how do I get to them? Which databases are live? Web servers? App servers?
Full-scale discovery and inventory is usually where I start, working my way back toward a tool based on the info (and actual scale) of the operating environments.
I'll go on a limb based on my own current experience.
I think just about all companies bigger than say seven people need two people split half IT and half "line functions".
Then when everything is humming, they can "just work". But when a cascade situation comes up, you do those Tier Levels. Level 1 does all the End User fallout. (Every computer needs to get that new utility installed, then all the printers quit working because of a 2 minute power outage (winter is coming), User 1 wants to know where their file is that they worked on for 3 hours. Oh look, it's in a temporary folder because it came straight from an email. etc.)
Then Tier 2 deals with all the system configs, there could be a software change coming, etc. That second pair of hands seems to be more than the sum of the hands in IT when managers want something fixed. I've done the Level 1 Helpdesk for a while now, with the second man more behind the scenes.
My first Journal Entry ever, in 8 years! http://slashdot.org/journal/365947/aphelion-scifi-fantasy-horror-poetry-webzine
One IT person is a problem for you?
I work in hospital with 300 beds, 100 doctors, 230 nurses and 150 other staff. Institution has got about 150 workstations, 15 servers (HIS, RIS, DB servers, etc.), two PACS, connected with 8-9 routers and +20 managed switches, a firewall/gateway and dozen of dedicated diagnostic systems.
IT personnel comes down to one IT guy and one guy that "can repair things".
I've seen a lot of folks suggesting that you focus on documenting but I take the opposite view: you should actually be working to make documentation unnecessary. Documentation, by its very nature, tends to become obsolete very quickly - and any good IT guy learns to look at what's actually going on in the system rather than relying solely on the documentation. So you want to make your system as self-documenting as possible:
I've done this three or four times and it's worked every time.
You can't just let the fires hang, so you need to start with putting fires out. Just do that with the long term in mind. Start putting fires out by solving the underlying issue and not just patching the symptom. The next step is to implement monitoring. Something like Nagios or Cacti. This process requires you to inventory systems/services and document those systems/services. It is a productive way to start peeking into the setup. It also helps you to notice fires before others do, so you can put them out before anyone has to ask you about it. "Fires" often exist long before anyone tells you about it. People just deal with issues until it prevents them from getting something urgent done. Then it becomes a fire. With all that in place, you will get ahead of the fires and better understand the systems under your care. Now you can start to plan and rebuild "in your own image".
Where does the white go when snow melts?
I was a predecessor at a small corp doing about $5M annually...and trained my replacements. Yeap, more than one--although only one of them was really 'good enough' to understand it in entirety. I even provided them with a "here's the steps to dig yourself out" plan. Six months after I quit, they had all quit. I heard from one of them afterwards that he was still "digging deeper" into tech debt.
No clue what's going on now. Although I am morbidly curious and would like to meet anyone good enough to "escape the tech pit".
For the unwary, there were a variety of programming and IT hacks that could cause things to shutdown. Usually partially and in weird ways.
Some of the worst...
- servers directly connected via crossover cable because we only had a 100 meg lan and it was too slow for database to app-server connection.
- Remote NFS mounts tunneled through SSH via special scripts.
- cronjobs remotely executed through SSH fixed keys (on different user names)
- remote database replication partially done through the above
- that would feed into sales and trouble ticket application tables that did not integrate with our native database
- absolutely bizarre firewall settings
- shared root passwords
- remote root login with a weak password on production network
( management's direct request. Better believe I saved that email)
- multiple webservers running different versions of apache, php
- IIS running in the DMZ with weird applications
- production webhosting system pointing to files on the development wiki
- absolutely bizarre backup practices
- Developer databases running off of a DROBO. VERY SLOWLY.
- A near total absence of unit tests
- Documentation months or years out of date, and none of the auto documentation served anywhere
- A bizarre chain of RPC dependencies that crossed multiple network boundaries with caching stuck in strange, unexpected places. In some instances, caches were "forever" -- or at least until "service restart". The aforementioned SSH cronjobs would sometimes kick these instead of a local cronjob, because the local clocks were unreliable and the system packages were too old to get reliable NTP clients... I could've compiled them myself, but that *might* have taken two or three hours to resolve dependencies--and I *knew* I could monkeypunch it together with SSH in 10 to 15 minutes.
And that's before discussing the things I did in the source code. The really disgusting things I'm not proud of. Sorry about that incident with the two donkeys and a parakeet in Juarez mom.
What I can say is... I am not incompetent as an admin, or a developer. I am however, someone who was not able to sell the necessary fixes to management. That'd ultimately be why I resigned. Some people might say that made me a poor deve-- I'd say it means management didn't do their job and accurately assess and manage... well... me. The certainty of an impending nervous breakdown as the complexity tidal wave started to overcome me and... time to quit. After I left I heard they took my network sequence diagram had the thing printed on an 8 foot wide laminated-markerable paper to try to help the new staff...
"Database" running slow? "I can fix it for a week, but I need..."
That week short-term became two years of a cron job pulling certain records to deal with a known bug involving cache performance...
VPN Running slow? "I can fix it by changing a setting in the client, but it's really because we don't have enough uplink... however, compression will work fine for most of your document editing since you copy it anyway..."
Became forever. Although the CRM client application would crash because things were so slow. We fixed that by virtualizing some desktops. Which resulted in contention among users. Not enough disk space to host those even with thin provisioning which ran out and locked one desktop down.... attach another DROBO to the SAN. Actual drives are too expe
I did something like this, although I had been the informal "back-up" admin for a while before I took over. Even so, the take-over was very sudden, and I quickly discovered there were whole facets of the system I didn't even know existed.
My solution was to pick one major sub-unit at a time, and migrate it to a system that I understood -- you probably have to do this anyways, since you have to do upgrades on your software, use that as an opportunity to get a grip on the system in question. In my case, the first choice was easy, the primary SAN host blew up about a month into the project. My users had a really lousy day that day, but 24 hours later, I had an object lesson in the importance of back-ups, and by God I knew how the (new) system worked.
2*3*3*3*3*11*251
1. Virtualize everything so that it can be back up and running quickly in case of failure.
2. Hire another programmer(s) and start a consistent rebuild in a single language and make sure it gets done right this time.
3. Spend most of your time keeping the old junk going and documenting what you have and assist in creating the new system.
When the rewrite is done then you will have a reasonable starting point to base future development on. Unless there is new development to be done at this time they will probably lay you off and keep the new programmer since he will have the most knowledge of the new system. You will have done what is best for the company though if that is any consolation.
You ask:
The big question now is, how do I, as a single person, effectively audit the network, servers, databases, backups, and formulate a long-term plan that can be implemented by one person?
I answer:
The same way you eat an elephant - one bite at a time.
Make a list of things to do.
Prioritize them to where you think they belong.
Update the priorities as things happen and you uncover more dire events.
Keep management, directors and executives updated very regularly. (They're likely going to need to spend money on updates at some point. Be honest and up front about it. If they're in the loop, there aren't surprises.)
Lather, rinse, repeat.
Awk! Pieces of eight. Pieces of eight. Pieces of seven... ERROR: General Protection Fault. [Paroty Error.]
As ex-programmer, i started as as lonely network-admin at a art-restauration firm.
I can share these experiences:
- Backup everything. Virtualize your servers in a test-lab
- Do'nt fix anything that isn't broken... monitor/understand your systems instead
- *If you have to*, than don't be afraid to take risks (upgrade servers, rewire switches,...)
- A sys-admin is a fire fighter:
--- People don't care how you fix it, so Don't expect gratitude and don't overcomplicate your work.
--- 'No' is an answer too.. or ask extra money: Never ever touch personal equipment (personal laptops, tablets,..)
--- Prioritise your work: Time management is your friend.
After six years and almost a heart-attack i have learned a lot and was able to do some cool projects, but i'm not going to do this for ever.
Basicly advise: act like the Basterd-Operator-from-hell
1- There is no "thriving enterprise" with a one man IT team.
2- There is no IT solution to a company's perception that in this times, they can stay in business (forget about "thriving") by neglecting the technology part of their business model
Begin at the beginning, continue through the middle, and end at the end.
The time and effort needed to understand arbitrarily bad code far far exceeds the time and effort to understand what they system is supposed to do and then implement that functionality.
I know people hate this answer (cue the outcry) but it's the simple truth. Crap code is a money / time pit with an unbounded downside. There is nothing to insure you against the code as-is being literally unextendable in mandatory directions sans an exponential explosion of needed changes.
Not writing code like this is what professional software engineering is all about. What it's NOT about is being a janitor named Sisyphus...
That's Sys-e-F.U.S. -> *UCKED UP *HIT
My backup was my boss who was technically competent, so there was that, but it's not like I've never worked a job as a one man show. You buckle down and do what you have to and make it so things don't break just because you're not around (yes, this requires budget, but I've been fortunate enough that anyone willing to pay me what I'm worth is also willing to invest in a solid infrastructure).
This made you (and the situation you described) an outlier, one with a positive outcome. Your experiences cannot be applied in general. In general, this is rarely the case, for one-man-tech shops that is.
For the most part, conditions as described by the original submitter typically have "GTFO ASAP!" written all over it. I've done IT in companies, small and large, and I can attest that what you say is true: Yes, it is possible to being the one-guy-IT-slash-programmer-shop at a small e-commerce company. But the question is why? I wouldn't do it (again) unless a good compensation package came with it (which is typically never the case), or if I'm fresh out of school with nothing on my plate to take (in which case, it is ok.)
Good companies are never based on one-man-IT-slash-dev-shops, regardless of size (or at least they try not to.) I know, again, I've worked with companies big and small. Conditions like that are typically good proxies for more systemic problems, and at the end of the day (whenever possible), you want a paycheck, a rewarding job and good working conditions. Rarely you see that with one-man-IT-slash-dev-shop gigs, rarely if ever, regardless of the size of the company.
That's just my $0.02 input from what I've seen. YMMV so readers be warned and please take this anecdotal piece with a grain of salt.
So you worked for one whole year in this industry, and that gives you insight enough to know that there is only ONE reason that things get to this state?
That is not what he said. He said this:
I worked in this environment for one year ...
As in "I worked in such type of environment as described by the original submitter". The rest of this person's post does not provide a context with which to infer that such a year was the only year in this industry. In fact, the post context hints very strongly that such is not the case.
IT has grip on YOU!
This sig is not paradoxical or ironic.
Seriously, continue documenting the existing systems until you have them all documented. Then start building a layout in how they are all integrated so you can see how everything relates. Once there, start with the core components and components with the fewest connections to other components and start updating/replacing them until you have it all replaced and upgraded to something more to your liking.
However, you'll probably leave before that is all done - thereby making it a greater mess for the next guy.
Just saying...
Truth is like the sun. You can shut it out for a time, but it ain't goin' away. - Elvis Presley (source: imdb.com)
You recognize that it's a company problem and that you're not enough manpower to get it done.
You get management to recognize that by explaining to them that it needs more manpower and it needs to be done.
If they balk, you're stuck dealing with it on your own, and it will be slow, and there may be times you just can't change something without changing several other things and you can't change them all in a short enough time to prevent the company from grinding to a halt, so you'll just have to leave them.
Fast. Do not look back, lest you turn into a pillar of salt!
Seriously, if you're over 30, quit now. Otherwise, if you are young and inexperienced, stay if this is a WELL-PAYING opportunity or you REALLY enjoy and trust yourself or the company.
If you stay, start MIGRATING the pieces into something you understand and can document. I recommend migrating to a mainstream, well-supported, open source projects UNLESS the proprietary alternative is vastly easier to deploy (this is generally not the case). Test each step and have a backout plan ready. Backup whatever you can.
Realize that you might succeed and prove yourself to the company. Also realize that it might all come crashing down on you, with all the blame being assigned to you for any and all things that go wrong. The pitfalls are many. Use the source. Good luck!
So everyone else has already commented, but here's my $0.02:
You're probably getting a million emails, texts, IMs, and other alarms per day.
Make them stop.
Don't disable the alarms, but pick something that seems important and noisy and figure out why it keeps wanting to pester you. Fix the root cause. That's one less thing you'll have to deal with tomorrow.
In short, be throughput driven and not interrupt driven. I have coworkers who have to deal with 100 small fires a day - and that's not an exaggeration. When I'm in their office discussing something, we're constantly interrupted by the "new mail deal with me right this moment!" sound. Don't do that! I probably get a Nagios warning once a month or so, typically telling me that the VPN to my house is down because it's raining and my DSL sucks, and that's about all I want to hear from the network.
Dewey, what part of this looks like authorities should be involved?
No matter what - focus on restore procedures.
That will presume backups - so start there. If backups are shoddy, FIX THAT FIRST. If there's one thing you can almost always get budget for its disaster recovery.
But *always*, *always* backup with the focus on how to *restore*. Backing up is easy, restoring is the hard part.
By doing this, you will identify dependencies, settings, installation procedures, etc. You'll also identify which systems are less critical than others.
Subsequently, you will know how long it'll take you to bring a system back up.
Lastly, you'll know how to save your ass if you break something.
Start your restore process by the simple edict of following the money. Work from the financial transaction outward.
/me sips his coffee and ponders a new sig...
...that your predecessor didn't comment his code?
This is why code comments and documentation are so important. I'll _never_ understand why so-called "experts" think comments just "get in the way." I would never hire someone with such an attitude, nor would I pay for freelance uncommented code: If you can't explain it, I'll assume you don't understand it.
What part of klatu verata nicto didn't you understand!?
I inherited this crap some the previous guy. Clearly he didn't know what he was doing. This is gonna take at least a year to fix.
* Backups plus tested restores
* SAN - start here for the greatest flexibility
* Configuration management
* Change management
* Migrate as much to servers from desktops as possible
* Deploy FLOSS when it makes sense
** cups for print ($0)
** alfresco for CIFS ($0)
** Zimbra for an Exchange replacement ($0)
** OpenLDAP unless you have a large scale setup. Then Redhat makes something that scales to many millions of accounts.
** OpenVPN for remote access
* Deploy commercial tools when there isn't a good FLOSS solution
* Dump as much Microsoft infrastructure as possible. Get off that drug. OpenOffice can easily replace most MS-Office instances.
* Virtualize all servers if possible. Avoid being screwed by VMware. LXC and KVM are solid and viable VM solutions.
Gee, I guess that describes what I've done here and at client locations.
Enterprises can't fire Microsoft, but small companies can. The savings can easily add up for a 20 person company. The system stability will improve too. Basically, my Linux systems don't go down unless I'm patching the kernel.
I agree. Do it. But under the following conditions.
1) Only if you can not take the job too seriously. If you're the type who gets stressed out about work, who doesn't know when it's time to walk away from an issue and start over tomorrow, this type of job can be heck.
But if you can remember it's only a job, and if things don't work out, you can get another job, that it's only bits you're moving around, it's not brain surgery, this sounds like a potentially rewarding opportunity. That leads to...
2) Make sure you are getting rewarded. If you are an old hat and getting paid well, take the money and run. It's job security (at least until you get things straightened out) and interesting work. You'll be facing a new problem every day, which sounds rough until you consider the boredom of facing the same problem every day.
Or, if you're a young gun, even not well paid, think of this as your education. What you learn on this job will aid you for the rest of your career. You'll get to work with many more systems and different technologies than you would at most IT jobs. And you'll get a chance to build up your troubleshooting muscles. Nothing obliterates my respect for someone who looks like a greybeard faster than a lack of troubleshooting sense.
Troubleshooting--good troubleshooting--is an art. Listen to Car Talk. That kind of crazy you only get from experience. There's a feel for how modules and systems interact and how something over here throws an error when something over there isn't right.
Think of this job as grad school. 2 or 3 years of dirty work, but do it for the education. Then get out.
. . . document everything. . . [m]ake sure your employer fully understands the situation you and they are in.
This was the first thought that came to me. While there surely are employers that are the personification of Evil (I, too, have met my share), most are simply trying to do the best they can, but are hampered in their ability to help you by a lack of time, a lack of knowledge of IT subjects, or both. Because of this, they can't independently judge the quality of the advice you're giving them -- i.e., they have to trust you. Since (at least in my experience) most conversations with management IT initiates are, in one way or another, a request to spend money, you can see the problem: It's difficult to continue to trust someone whose solution to every problem is to take money from you.
One technique I have used successfully is to never, ever refer to the IT department in particular, but always refer to the company as a whole. Also, as the parent says, document everything -- but do so in terms that will be meaningful to management. "The language of management is money," Juran said, and he was right. The infrastructure must make the company money, or it wouldn't be there -- talk to management in terms of the risks they are running to reduce revenue, increase expenses, or both.
Most business people, like investors in general, dislike surprises, and prefer to know their risks in advance. The trick is to present the situation in non-threatening terms, so that the boss feels like you're trying to make more money for him. Even after doing so, one has to be prepared for the possibility of a negative response, possibly even for a valid reason -- there's no cash available this month; they're planning to move anyway, etc. Or the boss just made a mistake. With any luck, the documentation you have on file will protect you, should blame be directed your way. If not, well, you didn't want to work there, anyway. Did you?
I thought dealing with inherited messes were the normal state of affairs in IT, or do I just have crummy luck?
I had this on my office wall at a former employer where I was in a similar situation.
The best approach is to define your systems, break them down into modules, and replace them in a systematic fashion. Eventually you'll be working in an infrastructure of your own design while avoiding disruptions.
If management let things get like this you can bet they won't fund fixing it either.
That both they and you assume a 'thriving e-commerce company` can be run with only a single technical staff beggars incredulity. But that is one more than at an ISP where I was once briefly employed. Maybe they don't want anyone hanging round that long. Cut your losses and start looking for your next job now. Else you're looking at a ten months burnout before they find your replacement, similar to the feller you replaced.
From your tone, I can tell you are closed-minded and overrate your self. You are a control freak that wants everything done your way, and you will fail if you have to think outside the box. You have to think more than merely "how do I as a single person.." You have to assume it will fail, and have a plan in place. Your predecessor may have had no choice but to use duct-tape like solutions, and if they fail, you might need some back-up duct tape. You also arrogantly assume that he/she had the resources available that you do - he/she likely didn't. Finally, the fact that you would as /. to do your job for you is the final testament that you don't know what you're doing, and are very lucky to have a job at all in this current economic climate. If you were smart (and I don't think you are ..not in real terms just drudge terms) you would assume that whatever solution you come up with, that someone even more regimental and incompetent than you will see it as duct tape, you need to envision problems a future techie will have when you're gone. But you won't do that, as you're too into your own ego. You will likely (if anything) make life very difficult for the person that follows you so you can be contacted and made to look good - the hero as it were. Plenty of your type around!
Every network is band-aids on band-aids. I'm literally shocked every morning when I wake up and it hasn't all collapsed.
Your best bet is to start making a comfortable nest of processes (Version Control, Deployment, etc) and start documenting how things work. It will be extremely helpful to you if you have a method by which you can revert any changes you make to the system. Once you have an architectural overview in place, you can start to make changes to the system comfortably.
Since you're filling several roles there, whatever they're paying you probably isn't enough. If you saw all these things and quoted them an hourly rate that was higher than you thought they could afford and they still made you an offer, good for you!
I'm trying to teach myself to set people on fire with my mind... Is it hot in here?
I can totally believe you on this point, especially in RE and wholesale banking where you deal in small numbers of transactions (relatively speaking), but ones with a lot of zeros and commas in them.
I am a 'one man show' + occasional contractor support consultant catering to the same industry, and my client base is about 450 users spread across a dozen small and medium sized wholesale mortgage banks (and a handful of non-banking companies somewhat reluctantly taken on as referrals). Collectively they are probably worth in the neighborhood of 1.5-2 billion, but as their business processes are very amenable to standardization and automation, they can have a very compact IT footprint once you develop a scalable 'pre fab' IT platform that suits them all.
This would not be the same situation at all if they were a group of retail store chains or distributors or medical offices or anything else with a high personnel to market cap ratio.
I've definitely cost 6 in house IT guys their jobs, and possibly more. I guess I am part of the problem in this industry, driving salaries down and creating the impression that anyone is replaceable. In my capacity as a consultant I am more replaceable than any full time person, and this makes me acutely aware of how much a commodity IT is becoming.
Take the weekly pay.
Apply more band aids
Spam around you're newly Man Up'ed resume and prepare to bail ASAP.
Suck it up, that's what you do. You don't think every other IT worker who has come into a job with a large company hasn't gone through the exact same thing? If you can't handle your own work responsibilities you need to get a job you _can_ handle, plain and simple.
It's always the same. The new person comes in. Doesn't understand the systems. Doesn't know the history that has transpired and how a product is developed. The new person has trouble, as we all do understanding and fearing what they see and complains.
Welcome to the real world of programming.
Replace everything. This is a slow process, and since you lack manpower, you will end up with the same amount of band-aids as you had in the first place. But theses will be your own band-aids, not the one you inherited, so you will have a chance to master the thing.
Rob, is that you?
Document, then go to whoever hands out the money. Get some for yourself.
What if you got really sick?
I call this the 'Hit By A Bus' scenario. If you're hit by a bus in the next five minutes can the business carry on without you?
That's just lazy! If it takes more than 5 minutes to get out of the bus' way you deserve to be hit! :p~~~~
These posts express my own personal views, not those of my employer
"A little over a month ago, I assumed the position of programmer and sole IT personnel at a thriving e-commerce company"
Hmmm programmer and sole IT person at a Thriving e-commerce company? are you sure???
If it was thriving, the company wouldn't have made the stupid decision of hiring the same person to take on both an operational and development role. Ignoring the logistics of it, it is also a SOX violation.
DANCE ON YOUR TOES! You may think you're drowning, but if you just dance a bit, you can get your nose above water enough to breathe.
Trust me, real geeks would give one of their balls to find a gig like this. You get to start EVERYTHING over! You're their new $deity! :-)
"Tongue tied and twisted, just an Earth bound misfit
Outsource it.
Slashdot = Sarcasm
Simple: Forget it. If you got hit by a truck the company will face bankruptcy in weeks.
You;d be surprised how small some of those shops tend to be. And they focus their resources on the programmers who make changes to the site, order fulfillment, and billing. IT is still considered overhead, and something to be minimized, unfortunate as that is.
Been there:
Risk Analysis is key: Decide what to Accept, Transfer (insurance), reduce (make little changes), avoid (completely replace)
I recommend a Wiki also. I have used Xwiki (free) or Share Point foundation (bundled on MS SBS 2010). I created a site collection based on the 10 domains taught in the CISSP course. The benefit is you can find anything this way via simple searches. Also Ubuntu has ZIM Desktop Wiki that runs as a webserver and can be searched too (free).
Document everything you do in a Wiki and prioritize based on Risk and as time goes on it will get easier
A lot of posts here have pointed out that it's a horrible idea to run a company with just one IT person.
None of you live in the real world. I work for a real company that has 7 full time employees, we do several million a year in business. I maintain servers in rented racks in two different datacenters (one on each coast). I don't even work in the office where the other 6 people work.
I do *everything* - SQL dba, network admin, server admin, mail admin, some programming (back end only, not graphical stuff - I suck at that), backups, any automation, any apps that run on servers I have to make/get to work, dns, web servers - you name it.
All our clients are large companies.
Is it perfect? No (incidentally, like one of the above posts, my boss is technical enough with lots of support docs from me to handle emergencies if I'm not around).
Could we even remotely afford to have 2 IT staffers - no. Could we have just two part timers backing each other up - sure. But is that really better - to have two employees that likely wont stick around forever (I've been here 10 years) who you need to replace from time to time? No - we do that for our production staff - and replacing those people who don't stay (and none of them do because it doesn't pay well enough) is always a pain.
Small companies need really dependable employees...it's unreasonable to think that in such a small company where things change all the time (procedures/tech/what have you) and you are also always trying to save a buck...to make the jobs be simple drag and drop positions - everyone here is important, everyone wears multiple hats and whenever anyone leaves everyone else has to cover until a suitable replacement can be brought up to speed (which can take months).
People are going to say - well your business is not well enough run. That's BS - it's a small business built off of the mortgage of one guy's house to start with - this is the way things work, if you don't get bought out, don't IPO, aren't a "startup" it's very hard to transition from 5-10 employees to dozens - it takes a whole new level of business to get there because you will end up having way more overhead per employee.
The trick is to do everything simple/well enough with enough backups that failures are minimized - do I maintain 5 9's uptime? Of course not, but i get close - I don't track it exactly (because we don't provide SLA's it's not that important) but we definitely hit somewhere near 4 9's - and if a customer needed and was willing to pay for 5 9's service - it would be easy to achieve (albeit more expensive, but we'd cover that by increasing the cost of services to the company requesting it).
I'm well aware that I'm jack of all trades and master of none...but I can generally make anything work given google and a couple of hours research, plus help from friends that are masters, but not jacks.
Sorry for the AC post, but my handle would be too easy to tie to me in RL on the off chance someone I actually knew read this.
To get to the OP's question - there's no good way to acquire running knowledge of a 1 man shop where you don't have access to the 1 man. Shit's going to break, you'll learn when you fix it - just make copies/backups of everything so you don't lose anything irreplaceable for now.
In fact it is easy!
1. VMware esxi (if its a small shop, just use the free version and not vSphere)
2. Storagecraft Shadowprotect. $900 per microsoft server
3. Linux servers? use scripts.
Congratulations- you're done, including bare metal DR.
I'm currently going through the same thing with a small holdings company in Ohio that runs several parts manufacturing factories. My suggestions for you are this.
1) You're going to have to work some weekends to get the major things upgraded, just accept this and get things done.
2) Start with the networking equipment. Get everything communicating across the network properly as all services rely on a stable network to work correctly.
3) Fix the major service issues that will get the most people happy. In my situation I had some very bad Netgear routers with unstable VPN's. Once I fixed that and did some basic maintenance to our terminal servers everyone thought I was a genius which bought me a lot of capital and time to get other projects pushed through.
4) Trickle your way down into smaller problems. Once I got done with the network equipment I connected some domain controllers to get them sharing user data which gave everyone one username and password. Again, the owners were very happy with this and I bought myself a lot of time and capital for my work.
5) Go through various network resources and get them properly managed and deployed so you can bring your build time down on new computers, servers, or other networking equipment. This could include making documentation, perl scripts to manage samba servers if you're on linux/unix, perl/batch/powershell/GPO setups for managing things on MS networks, etc, etc dependent on the services you use.
6) Once you're at this point you can work on peoples smaller problems and getting the little things running.
The biggest problem you'll run into is people who have little problems and you have to put them off while you fix the major ones, especially if you have whiners or those forgetful managers who can't remember anything you tell them to deal with. You'll need to make sure to effectively communicate time lines for problem resolution to keep everyone happy and do it over email so you can reference it. When you're trying to get the owners to buy new equipment remember to say things like "I know you can't put a price on the productivity you get from your IT equipment but if you spend X amount of dollars on Y equipment you can make Z things work more efficiently." Also refer to the expensive equipment purchases which are hard to justify as "Investments into your employees productivity". Business people will respond to those kinds of methods.