Slashdot Mirror


Ask Slashdot: Herding Cats, Aging Systems?

An anonymous reader writes: I've recently started a job at a medium-sized enterprise in the UK. They claimed to be an advocate of open-source. The job was advertised as a Linux sys-admin. I've been in the role a short while and the systems right across the business are end-of-life: lots of XP and 2003 servers, a handful of LAMP web servers, and a large IT department with almost no skills in the technologies on site. Most boxes have the default password still. As a senior techie, I've been tasked with helping bring the skillset of the rest of the staff up. Where would you start, given that most of the kit is EoL?

7 of 158 comments (clear)

  1. Go Virtual by BDMcGrew · · Score: 5, Insightful

    Well, your question leaves out a lot of details but from what you've said so far, look at getting some new hardware in there and start virtualizing some of the the EoL systems. This will provide you an upgrade path for existing systems and a snapshot'd point of restore in the event of a failure.

    1. Re:Go Virtual by ShanghaiBill · · Score: 5, Insightful

      Well, your question leaves out a lot of details

      The most important left out details are about politics, not technology. Do you have the support of top management? How powerful are the people that are opposed to your project? There are people that will actively work to sabotage your efforts, and use you as a scapegoat for everything that goes wrong. How are you planning to deal with that?

      Since you are the "new guy" trying to change things that you don't understand, you didn't even mention end-user applications, and you seem to be more interested in OSS-evangelism than supporting your users and helping them get their job done, my prediction is that you are going to be out of a job in less than six months.

    2. Re: Go Virtual by rwa2 · · Score: 5, Funny

      Yep, Virtualize all the things was the mantra ten years ago, and still applies well today. Get everyone smart on using vagrant and VirtualBox (better yet VMware or even libvirt-kvm if you can get them to run Linux on the bare metal), and start imaging all of those legacy servers in your sandbox VMs. Build a cluster of VM servers to migrate to. Set up load balancers and test failover and rollback deploys. Set up Jenkins or Rundeck to do and log all of the actual work, and a peer review system for checkins from Github. Implement change management on a ticketing system such as Redmine or get them to pay for Jira. Set up a kanban board in Trello or Jira and coordinate everyone via HipChat or Hangouts or Skype, preferably all three. Plus the Lync people, you'll need a separate Jabberd deployment to tie those people in. Set up a monitoring system like Icinga2 and write alert plugins to HipChat and PagerDuty. That will help with backend alerts, but you'll want frontend user flow testing too so sign up for AlertSite and train your UAT people to code up their flows in the Firefox plugin. The tests will put a lot of load on your systems, though, so invest in some application performance monitoring on your toolchain like NewRelic or AppDynamics to help identify where your performance bottlenecks lie. This is a good time to migrate everything to OpsCode Chef so you can automate all of your unit testing and integration testing to prevent regressions. There are still some gaps in what Chef can accomplish with some expediency, though, so better also set up Ansible to take care of doing the actual work while the test-kitchens are running through the Continuous Integration / Continuous Delivery pipeline. Spend a good bit of time automating your CMDB tool too so you can report on all of the discrepancies that get by both Chef and Ansible. At this point Splunk is getting kinda expensive, so have a team build up an ELK stack and deploy to a dozen instances on AWS. Oh, you need a dev environment for that too, since that one time that innocuous checkin broke everything, so make that 2 dozen instances. Graphite would be very useful too, if you had someone dedicated to making dashboards for it. But someone else threw up a Dasher page over a weekend and that displayed enough of a high-level view on the workplace monitor to make the execs happy without troubling them with the actual details of things that were broken. That person got promoted and then left the company, but the dashboard page still looks good and green, so we'll leave it running for now. Except at some point a RabbitMQ feeding the ELK stack used by the Dasher page somewhere choked on something being fed to the the log pipeline by carrotd, so you better go digging for that somewhere, since the execs have a demo coming up this week and they'd really like to show that display to depict what an up-to-the-minute decision-making capability they have, but they don't want to show the Icinga2 monitor because there's too much red and amber junk on it from transient test systems that can't use the test Icinga2 instance for some weird networking issue. That could be addressed by migrating your dev environments to docker containers so everything can run within the same VM host, then figure out whether you want to orchestrate them using CoreOS or Kubernetes or swarm or fleet along with the appropriate OpenFlow network definitions, but this isn't authorized to deploy the same way to production yet, so just hang tight for now, OK? Around this time, you should be ready to tackle the migration of your services to systemd.

  2. Running? by gstoddart · · Score: 5, Insightful

    As a senior techie, I've been tasked with helping bring the skillset of the rest of the staff up. Where would you start, given that most of the kit is EoL?

    Well, you have 3 main choices:

    1) Try to fix it and succeed
    2) Try to fix it and fail
    3) Run like hell

    You won't be able to force the rest of the staff to bring up their skillset. Management has clearly left it to rot on the vine for a very long time. And, by the sounds of it, they don't know what they've even got.

    A large IT department with no skills with the technologies on site? What exactly is that large IT department doing for this company? If you have a bunch of people with no skillsets with the technology they have ... then what skillsets do they have, and how is it helping you?

    Without more detail, I'm hearing "Hi, I've just joined a company with a terrible IT department, how do I fix that?" Who let it get into such a bad state? Because if they're still around, no way in hell you'll ever fix it.

    --
    Lost at C:>. Found at C.
    1. Re:Running? by TWX · · Score: 5, Insightful

      Yep. If you're not in-charge and able to make the tough calls (ie, figuring out who's actually supporting important stuff, who's not, and making the decisions about who gets a chance to migrate to something new and who needs to take their skillset elsewhere) then you're probably not going to make the difference that you want to make or that your superiors somehow expect.

      What I can say, from experience, is that you need to actually learn how things are working now before you start making changes. I've had bosses brought in from the outside that thought they were gods' gift to the IT world that decided to try to remake the organization in their own image, only be be fired less than a year later because they pissed off all of the existing IT staff such that the boss got no results, and pissed off the users by failing to maintain existing workflow such that the users' jobs became much harder or required lots of direct assistance.

      Learn what's there, why it's there, and understand that most decisions were made as a reaction to something prompting it to be necessary. Change what can be changed in a sane way, but don't take personal offense to anything as it is now as there are probably good reasons why it is the way it is. If you come in with the attitude that you can rip out everything without a care, you'll find suddenly that no staff will bother to warn you of the pitfalls in front of you that they're all well aware of, and you, not them, will be the one with egg on your face when it breaks because it was your decision to change it.

      --
      Do not look into laser with remaining eye.
  3. Low Hanging Fruit by AdelieMan · · Score: 5, Insightful

    I would audit everything, Make a matrix of things that need to be addressed easy to hard, least significant to most, and start chipping away at it. It will take time to turn that ship around, but it will be worth it, and you will keep your sanity.

  4. Re:Olut with the old, in with the new by bobbied · · Score: 5, Insightful

    Buy a new system. Power down every system in turn and try to power it up again. If it will not start, replace it.

    NEVER power down old hardware on purpose unless you have backup plan for the system... Old hardware has a habit of not coming back when you power off and if it dies, you created an emergency for yourself...

    There are going to be enough unforced errors in the process, you needn't go out and look to create them.

    --
    "File to fit, pound to insert, paint to match" - Aircraft Maintenance 101