Slashdot Mirror


Ask Slashdot: Herding Cats, Aging Systems?

An anonymous reader writes: I've recently started a job at a medium-sized enterprise in the UK. They claimed to be an advocate of open-source. The job was advertised as a Linux sys-admin. I've been in the role a short while and the systems right across the business are end-of-life: lots of XP and 2003 servers, a handful of LAMP web servers, and a large IT department with almost no skills in the technologies on site. Most boxes have the default password still. As a senior techie, I've been tasked with helping bring the skillset of the rest of the staff up. Where would you start, given that most of the kit is EoL?

17 of 158 comments (clear)

  1. Don't train them in the current systems by Chris+Mattern · · Score: 4, Insightful

    That's the most obvious thing. Bring in supported systems and train them in those systems as you deploy them.

    1. Re:Don't train them in the current systems by Archangel+Michael · · Score: 4, Insightful

      Before you bring in supported systems, you have to have a budget. Without a budget delineated, the rest of the decision making process is pure insanity.

      My first response is, estimate what the "golden" cost will be, and quadruple it. They will cut it in half, and it will cost you twice what you think it will, and you'll end up with an excellent system that is designed well and built right.

      If you need "enterprise" grade systems, make sure that you are identifying the vendors in the space and calculate budget accordingly. And remember, vendors lie.

      --
      Agent K: A *person* is smart. People are dumb, stupid, panicky animals, and you know it.
  2. Go Virtual by BDMcGrew · · Score: 5, Insightful

    Well, your question leaves out a lot of details but from what you've said so far, look at getting some new hardware in there and start virtualizing some of the the EoL systems. This will provide you an upgrade path for existing systems and a snapshot'd point of restore in the event of a failure.

    1. Re:Go Virtual by ShanghaiBill · · Score: 5, Insightful

      Well, your question leaves out a lot of details

      The most important left out details are about politics, not technology. Do you have the support of top management? How powerful are the people that are opposed to your project? There are people that will actively work to sabotage your efforts, and use you as a scapegoat for everything that goes wrong. How are you planning to deal with that?

      Since you are the "new guy" trying to change things that you don't understand, you didn't even mention end-user applications, and you seem to be more interested in OSS-evangelism than supporting your users and helping them get their job done, my prediction is that you are going to be out of a job in less than six months.

    2. Re: Go Virtual by rwa2 · · Score: 5, Funny

      Yep, Virtualize all the things was the mantra ten years ago, and still applies well today. Get everyone smart on using vagrant and VirtualBox (better yet VMware or even libvirt-kvm if you can get them to run Linux on the bare metal), and start imaging all of those legacy servers in your sandbox VMs. Build a cluster of VM servers to migrate to. Set up load balancers and test failover and rollback deploys. Set up Jenkins or Rundeck to do and log all of the actual work, and a peer review system for checkins from Github. Implement change management on a ticketing system such as Redmine or get them to pay for Jira. Set up a kanban board in Trello or Jira and coordinate everyone via HipChat or Hangouts or Skype, preferably all three. Plus the Lync people, you'll need a separate Jabberd deployment to tie those people in. Set up a monitoring system like Icinga2 and write alert plugins to HipChat and PagerDuty. That will help with backend alerts, but you'll want frontend user flow testing too so sign up for AlertSite and train your UAT people to code up their flows in the Firefox plugin. The tests will put a lot of load on your systems, though, so invest in some application performance monitoring on your toolchain like NewRelic or AppDynamics to help identify where your performance bottlenecks lie. This is a good time to migrate everything to OpsCode Chef so you can automate all of your unit testing and integration testing to prevent regressions. There are still some gaps in what Chef can accomplish with some expediency, though, so better also set up Ansible to take care of doing the actual work while the test-kitchens are running through the Continuous Integration / Continuous Delivery pipeline. Spend a good bit of time automating your CMDB tool too so you can report on all of the discrepancies that get by both Chef and Ansible. At this point Splunk is getting kinda expensive, so have a team build up an ELK stack and deploy to a dozen instances on AWS. Oh, you need a dev environment for that too, since that one time that innocuous checkin broke everything, so make that 2 dozen instances. Graphite would be very useful too, if you had someone dedicated to making dashboards for it. But someone else threw up a Dasher page over a weekend and that displayed enough of a high-level view on the workplace monitor to make the execs happy without troubling them with the actual details of things that were broken. That person got promoted and then left the company, but the dashboard page still looks good and green, so we'll leave it running for now. Except at some point a RabbitMQ feeding the ELK stack used by the Dasher page somewhere choked on something being fed to the the log pipeline by carrotd, so you better go digging for that somewhere, since the execs have a demo coming up this week and they'd really like to show that display to depict what an up-to-the-minute decision-making capability they have, but they don't want to show the Icinga2 monitor because there's too much red and amber junk on it from transient test systems that can't use the test Icinga2 instance for some weird networking issue. That could be addressed by migrating your dev environments to docker containers so everything can run within the same VM host, then figure out whether you want to orchestrate them using CoreOS or Kubernetes or swarm or fleet along with the appropriate OpenFlow network definitions, but this isn't authorized to deploy the same way to production yet, so just hang tight for now, OK? Around this time, you should be ready to tackle the migration of your services to systemd.

  3. Show them the risks by Tool+Man · · Score: 4, Interesting

    I don't know your organization's level of risk tolerance, but getting them to pay for one of the following would be an eye-opener:
    - A vulnerability assessment will show a sea of red for the unsupported platforms. Maybe that'll be sufficient to convince them that it's time to upgrade (and train up on new stuff).
    - A penetration test will take those same vulnerabilities, and combine it with attempting to use those vulnerabilities to see what they could get. The difference is in trying to use those issues, and turn them into "oh SHIT" screen shots in the report. It's the difference between "someone could theoretically do X" and "someone just did X, and documented it all for your edification."

    On the latter engagements, especially with the dreadfully old stuff, it is quite enlightening to include those screen shots that show how I've added new users, logged in with them, and used them to poke yet more systems I couldn't reach from the starting point. The under-educated staff would only help things if social engineering was in scope too.

  4. Running? by gstoddart · · Score: 5, Insightful

    As a senior techie, I've been tasked with helping bring the skillset of the rest of the staff up. Where would you start, given that most of the kit is EoL?

    Well, you have 3 main choices:

    1) Try to fix it and succeed
    2) Try to fix it and fail
    3) Run like hell

    You won't be able to force the rest of the staff to bring up their skillset. Management has clearly left it to rot on the vine for a very long time. And, by the sounds of it, they don't know what they've even got.

    A large IT department with no skills with the technologies on site? What exactly is that large IT department doing for this company? If you have a bunch of people with no skillsets with the technology they have ... then what skillsets do they have, and how is it helping you?

    Without more detail, I'm hearing "Hi, I've just joined a company with a terrible IT department, how do I fix that?" Who let it get into such a bad state? Because if they're still around, no way in hell you'll ever fix it.

    --
    Lost at C:>. Found at C.
    1. Re:Running? by TWX · · Score: 5, Insightful

      Yep. If you're not in-charge and able to make the tough calls (ie, figuring out who's actually supporting important stuff, who's not, and making the decisions about who gets a chance to migrate to something new and who needs to take their skillset elsewhere) then you're probably not going to make the difference that you want to make or that your superiors somehow expect.

      What I can say, from experience, is that you need to actually learn how things are working now before you start making changes. I've had bosses brought in from the outside that thought they were gods' gift to the IT world that decided to try to remake the organization in their own image, only be be fired less than a year later because they pissed off all of the existing IT staff such that the boss got no results, and pissed off the users by failing to maintain existing workflow such that the users' jobs became much harder or required lots of direct assistance.

      Learn what's there, why it's there, and understand that most decisions were made as a reaction to something prompting it to be necessary. Change what can be changed in a sane way, but don't take personal offense to anything as it is now as there are probably good reasons why it is the way it is. If you come in with the attitude that you can rip out everything without a care, you'll find suddenly that no staff will bother to warn you of the pitfalls in front of you that they're all well aware of, and you, not them, will be the one with egg on your face when it breaks because it was your decision to change it.

      --
      Do not look into laser with remaining eye.
    2. Re:Running? by TWX · · Score: 4, Insightful

      The article submitter made it clear that he's new. He very well may not understand the workflow and who actually knows how to take care of what. He needs to learn that before he can start making changes, or he, not the existing staff, will be the one blamed when everything goes wrong.

      IT attracts a fair amount of introverts. It's likely that a lot of his staff are playing their cards close to their chest because that's what they're simply used to doing. It's also possible that they themselves wanted to make changes but were not given the budget needed to do so, so legacy systems continue to be used. It could also be that a few incompetent people in key positions have gummed-up the whole works.

      Do you think that anyone wants to be stuck with ancient garbage if there's something newer that actually demonstrably works better? Most of the time the decisions that hold back the IT department are made either by IT management or by those outside of the IT department.

      --
      Do not look into laser with remaining eye.
  5. Not enough info brah by Iamthecheese · · Score: 4, Interesting

    It depends on how much actual authority you have, how conservative the corporate culture is, and whether there are any entrenched ways of doing things. This isn't a technical question but a political one. If you actually (as opposed to officially) have authority to tell them how to do things you need first find out how the system is working now. Maybe they didn't set up passwords because multiple departments need to connect to the same server and there's no secure password control in place. Maybe they're disorganized. Maybe they're inexperienced. These all require different activities to repair the problem.

    You mentioned EOL hardware, but you didn't say whether a migration is planned or whether the money is available for one. Obviously new hardware is a great opportunity for user training, but again there are too many unknowns here. How much extra time do the engineers have to train? How much of the existing system setup is invisibly a part of how the users interact with it?

    It sound to me like you're standing on a powder keg. The right way to deal with it is to gather information. Make benchmarks. Understand system inter-operations and use. Learn who is doing what and why. Only a fool would start declaring X and Y need to be done without taking a look around first.

    --
    If video games influenced behavior the Pac Man generation would be eating pills and running away from their problems.
  6. Low Hanging Fruit by AdelieMan · · Score: 5, Insightful

    I would audit everything, Make a matrix of things that need to be addressed easy to hard, least significant to most, and start chipping away at it. It will take time to turn that ship around, but it will be worth it, and you will keep your sanity.

    1. Re:Low Hanging Fruit by bluefoxlucid · · Score: 4, Insightful

      Hear hear. I would suggest not being shy of technology; I've been interested in Microsoft Project 365 integration with Sharepoint for a while, and you should definitely look at your options for project management whether they come from Microsoft, Oracle, or some no-name company that provides a fantastic and little-known product as an open-source support-contracted service. What you have there is a long program, and I suggest you get RMCProject's CAPM Exam Prep and the PMBOK if you haven't got project management skills, and spend the 3 months getting a basic grasp of all that right out of the gate.

      The primary tools you're going to want are risk management and hierarchical decomposition; however, on the scale you're talking about, full project management knowledge is going to be an outright requirement if you want to do anything resembling a competent job. You *won't* want to use the full suite of project management practices--you never want to use the full set of tools outright, but rather the ones you want, for any purpose in any field--but if that place is as big a rat hole as you say, you're going to need some accounting of what's going on.

      As the parent poster here says, you definitely need to start here:

      Make a matrix of things that need to be addressed easy to hard, least significant to most, and start chipping away at it.

      Get a list of discrete, finite, deliverable projects. Things you can put into boxes and say, "This is one thing I want to produce; it's of a nature that I can tell you what work is required, how much time it will take, and what it will cost." You'll start by examining the array of systems, breaking them down into departments and components (what do they support? What do they do for each department?), and deciding what you're replacing. Are you upgrading Windows XP with stitched-together software to Windows Server 2008, or are you transitioning to a new set of systems to solve the same problem in a different way? Get that list down.

      Each thing you want to address will be something small, finite, limited, and understood. You're replacing the groupware services--Exchange, for example; the thing that provides e-mail, calendar, and such--with an upgraded, better-implemented, or new product (exchange to Zimbra, Zimbra to Exchange, migration to a SaaS such as Google for Business or Office 365, etc.). Some things break out into phases or multiple projects, e.g.: migrating Exchange to Office 365 may involve a phase 1 of upgrading Exchange to the latest version, a phase 2 of enabling some kind of synchronization and backup that you don't have now, and a phase 3 of migrating to service; while you may find that your Zimbra installation has no back-ups because you need an enterprise backup solution, and so you can't get back-ups in until you get Bacula set up.

      Once you have your list, you can start breaking them out by hierarchical decomposition. You'll want to decompose the work: each deliverable (e.g. your project, Bacula backup infrastructure, delivers a working Bacula backup infrastructure as its product) breaks out into a complete set of deliverables (e.g. project management, support services, back-up strategy design, servers, client deployment with Puppet or SCCM or Ansible, etc.), which themselves each break down further. Once your work is broken down, you hit the bottom with sets of work packages--each a deliverable--that you can understand completely; you can turn those into lists of activities and tasks to produce the deliverable.

      The same goes for risks. You want to identify everything your experience says can go wrong, and use your experience to do qualitative risk analysis--what risks are important? Then you use a procedure of assessing probability vs severity to do quantitative risk analysis. You work out how to avoid (100%), mitigate (any%), accept (0%), or transfer (buy insurance) the negative risks (threats), and how to exploit (100%), en

  7. Re:Yes, buy lots of new things, money is no object by TWX · · Score: 4, Funny

    I think that bot from a few articles down is trying to weigh-in...

    --
    Do not look into laser with remaining eye.
  8. Re:Olut with the old, in with the new by bobbied · · Score: 5, Insightful

    Buy a new system. Power down every system in turn and try to power it up again. If it will not start, replace it.

    NEVER power down old hardware on purpose unless you have backup plan for the system... Old hardware has a habit of not coming back when you power off and if it dies, you created an emergency for yourself...

    There are going to be enough unforced errors in the process, you needn't go out and look to create them.

    --
    "File to fit, pound to insert, paint to match" - Aircraft Maintenance 101
  9. Lead, Mentor, Grow by mtippett · · Score: 4, Insightful

    You've been dropped in an environment that is legacy and probably has production problems. Use that to your advantage.

    You've been also dropped in a leadership role (not management, leadership).

    Your #1 target should be to make yourself redundant (which ironically is likely to get you promoted, it's called succession :).

    So look at doing something like identifying #1 problem (Pareto charts help). Ask for volunteers (or volunteer some people), give them the problem to solve, use whiteboards, etc to help them discover the solution. You may facilitate and provide hints to get things done. Empower and guide the people you are helping.

    Read up on https://en.wikipedia.org/wiki/..., you are likely in a #2 or #3 combination. You can help lead people to move to a #3 with leadership, with the idea to get to #1 over time (with their help).

    Of course there might be some issues that you might need to solve like EOL systems and any budget that may be needed. If the OS is old, then probably the HW is old as well. Budget for that is probably going to be your biggest issue.

  10. Wanted: by Drewdad · · Score: 4, Insightful

    Wanted: IT Director
    Pay-scale: Entry level.

  11. Cheapskates by scsirob · · Score: 4, Insightful

    They are not open source advocates, they are cheapskates who like the prospect of 'free' anything. No supported equipment, no updates, no training for their staff, they simply don't appreciate the value of their IT.

    Let me guess, no decent backups either? No DR plan? Nothing of the sort? If you want to stay there, demand a decent budget ( = commitment) and build greenfield. If you don't get a decent budget, run.

    --
    To Terminate, or not to Terminate, that's the question - SCSIROB