Slashdot Mirror


Ask Slashdot: Herding Cats, Aging Systems?

An anonymous reader writes: I've recently started a job at a medium-sized enterprise in the UK. They claimed to be an advocate of open-source. The job was advertised as a Linux sys-admin. I've been in the role a short while and the systems right across the business are end-of-life: lots of XP and 2003 servers, a handful of LAMP web servers, and a large IT department with almost no skills in the technologies on site. Most boxes have the default password still. As a senior techie, I've been tasked with helping bring the skillset of the rest of the staff up. Where would you start, given that most of the kit is EoL?

34 of 158 comments (clear)

  1. Don't train them in the current systems by Chris+Mattern · · Score: 4, Insightful

    That's the most obvious thing. Bring in supported systems and train them in those systems as you deploy them.

    1. Re:Don't train them in the current systems by Archangel+Michael · · Score: 4, Insightful

      Before you bring in supported systems, you have to have a budget. Without a budget delineated, the rest of the decision making process is pure insanity.

      My first response is, estimate what the "golden" cost will be, and quadruple it. They will cut it in half, and it will cost you twice what you think it will, and you'll end up with an excellent system that is designed well and built right.

      If you need "enterprise" grade systems, make sure that you are identifying the vendors in the space and calculate budget accordingly. And remember, vendors lie.

      --
      Agent K: A *person* is smart. People are dumb, stupid, panicky animals, and you know it.
    2. Re:Don't train them in the current systems by King_TJ · · Score: 2

      This is great advice.... If this place is anything like a couple of them I've seen before though? They likely decided to become primarily a "Linux shop" in the first place because they were unwilling to spend much on I.T. -- and somewhere along the line, staff deployed Linux as a way to keep old/obsolete hardware functional.

      Assuming you can get some kind of workable I.T. budget in place, I think you want to start by analyzing what's exactly going on, on the server-side of things. Windows Server 2003 still in use? Where and why? Is there an Active Directory master keeping all of the user account logins? How many servers are just doing basic file/print or web services for various things?

      In the last 2 jobs I've had, it made sense to invest in a relatively high-spec server to run VMWare ESXi and create virtual servers in place of the older, physical systems. Right off the bat, you get a cost savings in electrical power usage (less heat generated by a bunch of older servers in a computer room, etc.). If they have "legacy" apps that would be problematic to get running properly on a current OS, at least you can virtualize that old environment and run it on the new system where making regular snapshot images of the whole thing is trivial. And you often remove physical constraints on the maximum available storage space too. (Old servers with SCSI RAID cards may not support drive partitions over a certain size, and you may not be able to add hard drives of the capacities you typically see today.)

      On the PC workstation side of things? Anything running XP should be budgeted for complete replacement, IMO. Yes, some of those systems can easily run Windows 7 -- but by the time you buy the licenses for them, you're probably spending about as much as the used hardware is worth in resale value, if not more. Exceptions might be any laptops bought in the Win 7 era that just had XP loaded on them because that was what they preferred.... On those, maybe you can just load a Win 7 recovery/restore disc that came with it to begin with and get it current at no cost except for your time.

    3. Re:Don't train them in the current systems by Archangel+Michael · · Score: 2

      If you can't get a real IT budget, then all the Linux Wizardry isn't going to solve any problems. I have a phrase I use, "Good IT is expensive. Bad IT is costly".

      I've seen people "cheap out" trying to save a buck, only to lose their proverbial shirt in the process. It isn't worth it. It isn't worth it to work there, it isn't worth it to be a wizard for people who do not value IT.

      --
      Agent K: A *person* is smart. People are dumb, stupid, panicky animals, and you know it.
    4. Re:Don't train them in the current systems by Larryish · · Score: 2

      Nuke it from orbit.

      It's the only way to be sure.

    5. Re:Don't train them in the current systems by Lonewolf666 · · Score: 2

      If this place is anything like a couple of them I've seen before though? They likely decided to become primarily a "Linux shop" in the first place because they were unwilling to spend much on I.T. -- and somewhere along the line, staff deployed Linux as a way to keep old/obsolete hardware functional.

      This may be a valid approach if there are no Windows-only applications that are not easily replaced. But that is something you need to find out as soon as possible. IMHO that will make the difference between being able to switch to Linux in the short run and looking at a long transition period.

      For the City of Munich, switching to Linux took several years because they had lots of old applications on Windows for which there were no Linux equivalents.

      --
      C - the footgun of programming languages
  2. Go Virtual by BDMcGrew · · Score: 5, Insightful

    Well, your question leaves out a lot of details but from what you've said so far, look at getting some new hardware in there and start virtualizing some of the the EoL systems. This will provide you an upgrade path for existing systems and a snapshot'd point of restore in the event of a failure.

    1. Re: Go Virtual by W.+Justice+Black · · Score: 2

      This.

      Getting things in a state that they're repeatable is step one and it very much sounds like you dont have that. Using a combination of VM and deployment technologies (like puppet) will both give you a safe sandbox to work in and careful change management. Once you have that the rest should fall into place much easier (disaster recovery, upgrade management, etc are much simpler).

      --
      "Time flies like an arrow; fruit flies like a banana." --Groucho Marx
    2. Re:Go Virtual by ShanghaiBill · · Score: 5, Insightful

      Well, your question leaves out a lot of details

      The most important left out details are about politics, not technology. Do you have the support of top management? How powerful are the people that are opposed to your project? There are people that will actively work to sabotage your efforts, and use you as a scapegoat for everything that goes wrong. How are you planning to deal with that?

      Since you are the "new guy" trying to change things that you don't understand, you didn't even mention end-user applications, and you seem to be more interested in OSS-evangelism than supporting your users and helping them get their job done, my prediction is that you are going to be out of a job in less than six months.

    3. Re: Go Virtual by rwa2 · · Score: 5, Funny

      Yep, Virtualize all the things was the mantra ten years ago, and still applies well today. Get everyone smart on using vagrant and VirtualBox (better yet VMware or even libvirt-kvm if you can get them to run Linux on the bare metal), and start imaging all of those legacy servers in your sandbox VMs. Build a cluster of VM servers to migrate to. Set up load balancers and test failover and rollback deploys. Set up Jenkins or Rundeck to do and log all of the actual work, and a peer review system for checkins from Github. Implement change management on a ticketing system such as Redmine or get them to pay for Jira. Set up a kanban board in Trello or Jira and coordinate everyone via HipChat or Hangouts or Skype, preferably all three. Plus the Lync people, you'll need a separate Jabberd deployment to tie those people in. Set up a monitoring system like Icinga2 and write alert plugins to HipChat and PagerDuty. That will help with backend alerts, but you'll want frontend user flow testing too so sign up for AlertSite and train your UAT people to code up their flows in the Firefox plugin. The tests will put a lot of load on your systems, though, so invest in some application performance monitoring on your toolchain like NewRelic or AppDynamics to help identify where your performance bottlenecks lie. This is a good time to migrate everything to OpsCode Chef so you can automate all of your unit testing and integration testing to prevent regressions. There are still some gaps in what Chef can accomplish with some expediency, though, so better also set up Ansible to take care of doing the actual work while the test-kitchens are running through the Continuous Integration / Continuous Delivery pipeline. Spend a good bit of time automating your CMDB tool too so you can report on all of the discrepancies that get by both Chef and Ansible. At this point Splunk is getting kinda expensive, so have a team build up an ELK stack and deploy to a dozen instances on AWS. Oh, you need a dev environment for that too, since that one time that innocuous checkin broke everything, so make that 2 dozen instances. Graphite would be very useful too, if you had someone dedicated to making dashboards for it. But someone else threw up a Dasher page over a weekend and that displayed enough of a high-level view on the workplace monitor to make the execs happy without troubling them with the actual details of things that were broken. That person got promoted and then left the company, but the dashboard page still looks good and green, so we'll leave it running for now. Except at some point a RabbitMQ feeding the ELK stack used by the Dasher page somewhere choked on something being fed to the the log pipeline by carrotd, so you better go digging for that somewhere, since the execs have a demo coming up this week and they'd really like to show that display to depict what an up-to-the-minute decision-making capability they have, but they don't want to show the Icinga2 monitor because there's too much red and amber junk on it from transient test systems that can't use the test Icinga2 instance for some weird networking issue. That could be addressed by migrating your dev environments to docker containers so everything can run within the same VM host, then figure out whether you want to orchestrate them using CoreOS or Kubernetes or swarm or fleet along with the appropriate OpenFlow network definitions, but this isn't authorized to deploy the same way to production yet, so just hang tight for now, OK? Around this time, you should be ready to tackle the migration of your services to systemd.

  3. This is a tough one... by Anonymous Coward · · Score: 3, Funny

    No guns, no knives... do you pussies still get rope or are you going to have to find a tall building to jump off instead?

  4. Show them the risks by Tool+Man · · Score: 4, Interesting

    I don't know your organization's level of risk tolerance, but getting them to pay for one of the following would be an eye-opener:
    - A vulnerability assessment will show a sea of red for the unsupported platforms. Maybe that'll be sufficient to convince them that it's time to upgrade (and train up on new stuff).
    - A penetration test will take those same vulnerabilities, and combine it with attempting to use those vulnerabilities to see what they could get. The difference is in trying to use those issues, and turn them into "oh SHIT" screen shots in the report. It's the difference between "someone could theoretically do X" and "someone just did X, and documented it all for your edification."

    On the latter engagements, especially with the dreadfully old stuff, it is quite enlightening to include those screen shots that show how I've added new users, logged in with them, and used them to poke yet more systems I couldn't reach from the starting point. The under-educated staff would only help things if social engineering was in scope too.

  5. Running? by gstoddart · · Score: 5, Insightful

    As a senior techie, I've been tasked with helping bring the skillset of the rest of the staff up. Where would you start, given that most of the kit is EoL?

    Well, you have 3 main choices:

    1) Try to fix it and succeed
    2) Try to fix it and fail
    3) Run like hell

    You won't be able to force the rest of the staff to bring up their skillset. Management has clearly left it to rot on the vine for a very long time. And, by the sounds of it, they don't know what they've even got.

    A large IT department with no skills with the technologies on site? What exactly is that large IT department doing for this company? If you have a bunch of people with no skillsets with the technology they have ... then what skillsets do they have, and how is it helping you?

    Without more detail, I'm hearing "Hi, I've just joined a company with a terrible IT department, how do I fix that?" Who let it get into such a bad state? Because if they're still around, no way in hell you'll ever fix it.

    --
    Lost at C:>. Found at C.
    1. Re:Running? by TWX · · Score: 5, Insightful

      Yep. If you're not in-charge and able to make the tough calls (ie, figuring out who's actually supporting important stuff, who's not, and making the decisions about who gets a chance to migrate to something new and who needs to take their skillset elsewhere) then you're probably not going to make the difference that you want to make or that your superiors somehow expect.

      What I can say, from experience, is that you need to actually learn how things are working now before you start making changes. I've had bosses brought in from the outside that thought they were gods' gift to the IT world that decided to try to remake the organization in their own image, only be be fired less than a year later because they pissed off all of the existing IT staff such that the boss got no results, and pissed off the users by failing to maintain existing workflow such that the users' jobs became much harder or required lots of direct assistance.

      Learn what's there, why it's there, and understand that most decisions were made as a reaction to something prompting it to be necessary. Change what can be changed in a sane way, but don't take personal offense to anything as it is now as there are probably good reasons why it is the way it is. If you come in with the attitude that you can rip out everything without a care, you'll find suddenly that no staff will bother to warn you of the pitfalls in front of you that they're all well aware of, and you, not them, will be the one with egg on your face when it breaks because it was your decision to change it.

      --
      Do not look into laser with remaining eye.
    2. Re:Running? by TWX · · Score: 4, Insightful

      The article submitter made it clear that he's new. He very well may not understand the workflow and who actually knows how to take care of what. He needs to learn that before he can start making changes, or he, not the existing staff, will be the one blamed when everything goes wrong.

      IT attracts a fair amount of introverts. It's likely that a lot of his staff are playing their cards close to their chest because that's what they're simply used to doing. It's also possible that they themselves wanted to make changes but were not given the budget needed to do so, so legacy systems continue to be used. It could also be that a few incompetent people in key positions have gummed-up the whole works.

      Do you think that anyone wants to be stuck with ancient garbage if there's something newer that actually demonstrably works better? Most of the time the decisions that hold back the IT department are made either by IT management or by those outside of the IT department.

      --
      Do not look into laser with remaining eye.
    3. Re:Running? by gstoddart · · Score: 2

      I kept things together the best I could but eventually realized I was being set up for failure. I was going to be the scapegoat.

      The only things you can do in that situation are:

      1) run like hell
      2) document all of your concerns so they can't blame you when it blows up

      But then if it ever comes to having to prove how you told them so, you'll be wondering why you didn't just run like hell in the first place, because at that point you've wasted your time and have been tainted by the project anyway as the ones really at fault continue to deflect when you're not around to defend yourself.

      In some cases, the only way to win is not to play. It's important to be able to spot those.

      --
      Lost at C:>. Found at C.
    4. Re:Running? by FranTaylor · · Score: 3, Insightful

      This sounds like a highly dysfunctional environment

      Mechanical-type people are usually pretty horrified by the short lifespans of computers. They are used to dealing with things like turret lathes and drill presses that can handle 50 years of continuous use. It could be a perfectly natural reaction.

  6. Clean it up by Anonymous Coward · · Score: 2, Insightful

    Kill everyone. Set fire to the place. Plead insanity. When they see what you were supposed to work with, they'll believe you.

  7. A plan and boss buy-in by i.r.id10t · · Score: 3, Insightful

    Make a map of what you have, what the main issues are with each piece, and then a plan for replacement/updating/whatever. Try to include some rough (and higher than you really think it will be) cost estimates. Then present to a boss, and get buy-in. If you don't get buy-in, start updating your CV and look for another job.

    --
    Don't blame me, I voted for Kodos
  8. Not enough info brah by Iamthecheese · · Score: 4, Interesting

    It depends on how much actual authority you have, how conservative the corporate culture is, and whether there are any entrenched ways of doing things. This isn't a technical question but a political one. If you actually (as opposed to officially) have authority to tell them how to do things you need first find out how the system is working now. Maybe they didn't set up passwords because multiple departments need to connect to the same server and there's no secure password control in place. Maybe they're disorganized. Maybe they're inexperienced. These all require different activities to repair the problem.

    You mentioned EOL hardware, but you didn't say whether a migration is planned or whether the money is available for one. Obviously new hardware is a great opportunity for user training, but again there are too many unknowns here. How much extra time do the engineers have to train? How much of the existing system setup is invisibly a part of how the users interact with it?

    It sound to me like you're standing on a powder keg. The right way to deal with it is to gather information. Make benchmarks. Understand system inter-operations and use. Learn who is doing what and why. Only a fool would start declaring X and Y need to be done without taking a look around first.

    --
    If video games influenced behavior the Pac Man generation would be eating pills and running away from their problems.
  9. Low Hanging Fruit by AdelieMan · · Score: 5, Insightful

    I would audit everything, Make a matrix of things that need to be addressed easy to hard, least significant to most, and start chipping away at it. It will take time to turn that ship around, but it will be worth it, and you will keep your sanity.

    1. Re:Low Hanging Fruit by bluefoxlucid · · Score: 4, Insightful

      Hear hear. I would suggest not being shy of technology; I've been interested in Microsoft Project 365 integration with Sharepoint for a while, and you should definitely look at your options for project management whether they come from Microsoft, Oracle, or some no-name company that provides a fantastic and little-known product as an open-source support-contracted service. What you have there is a long program, and I suggest you get RMCProject's CAPM Exam Prep and the PMBOK if you haven't got project management skills, and spend the 3 months getting a basic grasp of all that right out of the gate.

      The primary tools you're going to want are risk management and hierarchical decomposition; however, on the scale you're talking about, full project management knowledge is going to be an outright requirement if you want to do anything resembling a competent job. You *won't* want to use the full suite of project management practices--you never want to use the full set of tools outright, but rather the ones you want, for any purpose in any field--but if that place is as big a rat hole as you say, you're going to need some accounting of what's going on.

      As the parent poster here says, you definitely need to start here:

      Make a matrix of things that need to be addressed easy to hard, least significant to most, and start chipping away at it.

      Get a list of discrete, finite, deliverable projects. Things you can put into boxes and say, "This is one thing I want to produce; it's of a nature that I can tell you what work is required, how much time it will take, and what it will cost." You'll start by examining the array of systems, breaking them down into departments and components (what do they support? What do they do for each department?), and deciding what you're replacing. Are you upgrading Windows XP with stitched-together software to Windows Server 2008, or are you transitioning to a new set of systems to solve the same problem in a different way? Get that list down.

      Each thing you want to address will be something small, finite, limited, and understood. You're replacing the groupware services--Exchange, for example; the thing that provides e-mail, calendar, and such--with an upgraded, better-implemented, or new product (exchange to Zimbra, Zimbra to Exchange, migration to a SaaS such as Google for Business or Office 365, etc.). Some things break out into phases or multiple projects, e.g.: migrating Exchange to Office 365 may involve a phase 1 of upgrading Exchange to the latest version, a phase 2 of enabling some kind of synchronization and backup that you don't have now, and a phase 3 of migrating to service; while you may find that your Zimbra installation has no back-ups because you need an enterprise backup solution, and so you can't get back-ups in until you get Bacula set up.

      Once you have your list, you can start breaking them out by hierarchical decomposition. You'll want to decompose the work: each deliverable (e.g. your project, Bacula backup infrastructure, delivers a working Bacula backup infrastructure as its product) breaks out into a complete set of deliverables (e.g. project management, support services, back-up strategy design, servers, client deployment with Puppet or SCCM or Ansible, etc.), which themselves each break down further. Once your work is broken down, you hit the bottom with sets of work packages--each a deliverable--that you can understand completely; you can turn those into lists of activities and tasks to produce the deliverable.

      The same goes for risks. You want to identify everything your experience says can go wrong, and use your experience to do qualitative risk analysis--what risks are important? Then you use a procedure of assessing probability vs severity to do quantitative risk analysis. You work out how to avoid (100%), mitigate (any%), accept (0%), or transfer (buy insurance) the negative risks (threats), and how to exploit (100%), en

  10. Re:Yes, buy lots of new things, money is no object by TWX · · Score: 4, Funny

    I think that bot from a few articles down is trying to weigh-in...

    --
    Do not look into laser with remaining eye.
  11. Re:Olut with the old, in with the new by bobbied · · Score: 5, Insightful

    Buy a new system. Power down every system in turn and try to power it up again. If it will not start, replace it.

    NEVER power down old hardware on purpose unless you have backup plan for the system... Old hardware has a habit of not coming back when you power off and if it dies, you created an emergency for yourself...

    There are going to be enough unforced errors in the process, you needn't go out and look to create them.

    --
    "File to fit, pound to insert, paint to match" - Aircraft Maintenance 101
  12. Lead, Mentor, Grow by mtippett · · Score: 4, Insightful

    You've been dropped in an environment that is legacy and probably has production problems. Use that to your advantage.

    You've been also dropped in a leadership role (not management, leadership).

    Your #1 target should be to make yourself redundant (which ironically is likely to get you promoted, it's called succession :).

    So look at doing something like identifying #1 problem (Pareto charts help). Ask for volunteers (or volunteer some people), give them the problem to solve, use whiteboards, etc to help them discover the solution. You may facilitate and provide hints to get things done. Empower and guide the people you are helping.

    Read up on https://en.wikipedia.org/wiki/..., you are likely in a #2 or #3 combination. You can help lead people to move to a #3 with leadership, with the idea to get to #1 over time (with their help).

    Of course there might be some issues that you might need to solve like EOL systems and any budget that may be needed. If the OS is old, then probably the HW is old as well. Budget for that is probably going to be your biggest issue.

  13. Wanted: by Drewdad · · Score: 4, Insightful

    Wanted: IT Director
    Pay-scale: Entry level.

    1. Re:Wanted: by Tablizer · · Score: 2

      ...with 20 years of experience in Java 9.

  14. Re:Run for your life by NoNonAlphaCharsHere · · Score: 2

    EJECT!! EJECT!!

    You'll never ever overcome that much inertia and penny-pinching. Don't spend the next five years being frustrated before you figure this out.

  15. start by JohnVanVliet · · Score: 3, Insightful

    -- quote--
      Where would you start,....
    ----------

    with the thermonuclear option !

    with DEFAULT passwords of "password"
    and using XP and MS 2003

    the use of DBAN has been authorized

    --
    "I don't pitch OpenSUSE Linux to my friends, i let Microsoft do it for me
  16. Cheapskates by scsirob · · Score: 4, Insightful

    They are not open source advocates, they are cheapskates who like the prospect of 'free' anything. No supported equipment, no updates, no training for their staff, they simply don't appreciate the value of their IT.

    Let me guess, no decent backups either? No DR plan? Nothing of the sort? If you want to stay there, demand a decent budget ( = commitment) and build greenfield. If you don't get a decent budget, run.

    --
    To Terminate, or not to Terminate, that's the question - SCSIROB
  17. A fire? by onkelonkel · · Score: 3, Insightful

    Seriously, "accidentally" toss a lighted cigarette into the paper recycling bin in the server room on your way out one night. You'll be able to start fresh with the insurance money.

    --
    None of them can see the clouds; The polished wings don't care.
  18. dust off and nuke it from orbit... by advocate_one · · Score: 2

    it's the only way to be sure...

    --
    Donald 'Duck' Dunn: We had a band powerful enough to turn goat piss into gasoline.
  19. Re: Training Program by rickb928 · · Score: 2

    Hell, of they don't have skills in XP and 2003, it's either train or hire new. Those are legacy tech, your staff should be nailing these now.

    And if they can get control of the existing tech, they have a chance at mastering the new. If they can't even handle the old, well, a new crew is in your future.

    --
    deleting the extra space after periods so i can stay relevant, yeah.
  20. Make the separate firewall works? by Anonymous Coward · · Score: 2, Insightful

    Make sure that the separate firewall works, then go from there. Were your bosses thinking that a Linux admin was a Windows admin with extra skills, that the Windows skills came automatically with the Linux skills?

    Don't beat up on the geezers there for having stale skills. They might actually be OK at keeping those obsolete systems running. Some of them might be OK at getting a new system running, unless they're stuck in their ways.