Slashdot Mirror


Ask Slashdot: Herding Cats, Aging Systems?

An anonymous reader writes: I've recently started a job at a medium-sized enterprise in the UK. They claimed to be an advocate of open-source. The job was advertised as a Linux sys-admin. I've been in the role a short while and the systems right across the business are end-of-life: lots of XP and 2003 servers, a handful of LAMP web servers, and a large IT department with almost no skills in the technologies on site. Most boxes have the default password still. As a senior techie, I've been tasked with helping bring the skillset of the rest of the staff up. Where would you start, given that most of the kit is EoL?

3 of 158 comments (clear)

  1. This is a tough one... by Anonymous Coward · · Score: 3, Funny

    No guns, no knives... do you pussies still get rope or are you going to have to find a tall building to jump off instead?

  2. Re:Yes, buy lots of new things, money is no object by TWX · · Score: 4, Funny

    I think that bot from a few articles down is trying to weigh-in...

    --
    Do not look into laser with remaining eye.
  3. Re: Go Virtual by rwa2 · · Score: 5, Funny

    Yep, Virtualize all the things was the mantra ten years ago, and still applies well today. Get everyone smart on using vagrant and VirtualBox (better yet VMware or even libvirt-kvm if you can get them to run Linux on the bare metal), and start imaging all of those legacy servers in your sandbox VMs. Build a cluster of VM servers to migrate to. Set up load balancers and test failover and rollback deploys. Set up Jenkins or Rundeck to do and log all of the actual work, and a peer review system for checkins from Github. Implement change management on a ticketing system such as Redmine or get them to pay for Jira. Set up a kanban board in Trello or Jira and coordinate everyone via HipChat or Hangouts or Skype, preferably all three. Plus the Lync people, you'll need a separate Jabberd deployment to tie those people in. Set up a monitoring system like Icinga2 and write alert plugins to HipChat and PagerDuty. That will help with backend alerts, but you'll want frontend user flow testing too so sign up for AlertSite and train your UAT people to code up their flows in the Firefox plugin. The tests will put a lot of load on your systems, though, so invest in some application performance monitoring on your toolchain like NewRelic or AppDynamics to help identify where your performance bottlenecks lie. This is a good time to migrate everything to OpsCode Chef so you can automate all of your unit testing and integration testing to prevent regressions. There are still some gaps in what Chef can accomplish with some expediency, though, so better also set up Ansible to take care of doing the actual work while the test-kitchens are running through the Continuous Integration / Continuous Delivery pipeline. Spend a good bit of time automating your CMDB tool too so you can report on all of the discrepancies that get by both Chef and Ansible. At this point Splunk is getting kinda expensive, so have a team build up an ELK stack and deploy to a dozen instances on AWS. Oh, you need a dev environment for that too, since that one time that innocuous checkin broke everything, so make that 2 dozen instances. Graphite would be very useful too, if you had someone dedicated to making dashboards for it. But someone else threw up a Dasher page over a weekend and that displayed enough of a high-level view on the workplace monitor to make the execs happy without troubling them with the actual details of things that were broken. That person got promoted and then left the company, but the dashboard page still looks good and green, so we'll leave it running for now. Except at some point a RabbitMQ feeding the ELK stack used by the Dasher page somewhere choked on something being fed to the the log pipeline by carrotd, so you better go digging for that somewhere, since the execs have a demo coming up this week and they'd really like to show that display to depict what an up-to-the-minute decision-making capability they have, but they don't want to show the Icinga2 monitor because there's too much red and amber junk on it from transient test systems that can't use the test Icinga2 instance for some weird networking issue. That could be addressed by migrating your dev environments to docker containers so everything can run within the same VM host, then figure out whether you want to orchestrate them using CoreOS or Kubernetes or swarm or fleet along with the appropriate OpenFlow network definitions, but this isn't authorized to deploy the same way to production yet, so just hang tight for now, OK? Around this time, you should be ready to tackle the migration of your services to systemd.