Best Practices For Infrastructure Upgrade?

← Back to Stories (view on slashdot.org)

Best Practices For Infrastructure Upgrade?

Posted by timothy on Saturday November 21, 2009 @10:50AM from the thinking-ahead dept.

An anonymous reader writes "I was put in charge of an aging IT infrastructure that needs a serious overhaul. Current services include the usual suspects, i.e. www, ftp, email, dns, firewall, DHCP — and some more. In most cases, each service runs on its own hardware, some of them for the last seven years straight. The machines still can (mostly) handle the load that ~150 people in multiple offices put on them, but there's hardly any fallback if any of the services die or an office is disconnected. Now, as the hardware must be replaced, I'd like to buff things up a bit: distributed instances of services (at least one instance per office) and a fallback/load-balancing scheme (either to an instance in another office or a duplicated one within the same). Services running on virtualized servers hosted by a single reasonably-sized machine per office (plus one for testing and a spare) seem to recommend themselves. What's you experience with virtualization of services and implementing fallback/load-balancing schemes? What's Best Practice for an update like this? I'm interested in your success stories and anecdotes, but also pointers and (book) references. Thanks!"

13 of 264 comments (clear)

Min score:

Reason:

Sort:

Re:Cloud Computing(TM) by lukas84 · 2009-11-21 11:04 · Score: 5, Insightful

No, the budget questions comes later.
The first questions are: What are your businesses requirements regarding your IT infrastructure? How long can you do business without it? How fast does something need to be restored?
Starting with those requirements, you can start with possible designs that fit those solutions - for example, if the requirement is that a machine must be operational at last a week after a crash, you can build computers from random spare parts and hope that they'll work. If the requirement is that it should be up and running in two days, you will need to buy servers from a Tier 1 vendor like HP or IBM with appropriate service contracts. If the requirement is that everything must be up and running again in 4 hours, you'll need backups, clusters, site resilience, replicated SAN, etc. pp.
The question of Budget comes into play much later.
I'd say by pele · 2009-11-21 11:04 · Score: 5, Informative

don't touch anything if it's been up and running for the past 7 years. if you really must replicate then get some more cheap boxes and replicate. it's cheaper and faster than virtual anything. if you must. but 150 users doesn't warrant anything in my oppinion. I'd rather invest in backup links (from different companies) between offices. you can bond them for extra throughput.
Think about the complexity of duplication by El+Cubano · 2009-11-21 11:07 · Score: 4, Insightful
there's hardly any fallback if any of the services dies or an office is disconnected. Now, as the hardware must be replaced, I'd like to buff things up a bit: distributed instances of services (at least one instance per office) and a fallback/load-balancing scheme (either to an instance in another office or a duplicated one within the same).
Is that really necessary? I know that we all would like to have bullet-proof services. However, is the network service to the various offices so unreliable that it justifies the added complexity of instantiating services at every location? Or even introducing redundancy at each location? If you were talking about thousands or tens of thousands of users at each location, it might make sense just because you would have to distribute the load in some way.
What you need to do is evaluate your connectivity and its reliability. For example:
- How reliable is the current connectivity?
- If it is not reliable enough, how much would it cost over the long run to upgrade to a sufficiently reliable service?
- If the connection goes down, how does it affect that office? (I.e., if the Internet is completely inaccessible, will having all those duplicated services at the remote office enable them to continue working as though nothing were wrong? If the service being out causes such a disruption that having duplicate services at the remote office doesn't help, then why bother?)
- How much will it cost over the long run to add all that extra hardware, along with the burden of maintaining it and all the services running on it?
Once you answer at least those questions, then you have the information you need in order to make a sensible decision.
Get someone experienced on the boat! by lukas84 · 2009-11-21 11:09 · Score: 5, Insightful

You know, you could've started with a bit more details - what operating system are you running on the servers? What OS are the clients running? What level of service are you trying to achieve? How many people work in your shop? What's their level of expertise?
If you're asking this on Slashdot now, it means you don't enough experience with this yet - so my first advice would be to get someone involved who does. Someone with many people with lots of experience and knowledge on the platform you work on. This means you'll have backup in case something goes south and your network design will benefit from their experience.
As for other advise, make sure you get the requirements from the higher-ups in writing. Sometimes they have ridiculous ideas regarding they availability they want and how much they're willing to pay for it.
Take your time by BooRadley · 2009-11-21 11:13 · Score: 4, Insightful

If you're like most IT managers, you probably have a budget. Which is probably wholly inadequate for immediately and elegantly solving your problems.
Look at your company's business, and how the different offices interact with each other, and with your customers. By just upgrading existing infrastructure, you may be putting some of the money and time where it's not needed, instead of just shutting down a service or migrating it to something more modern or easier to manage. Free is not always better, unless your time has no value.
Pick a few projects to help you get a handle on the things that need more planning, and try and put out any fires as quickly as possible, without committing to a long-term technology plan for remediation.
Your objective is to make the transition as boring as possible for the end users, except for the parts where things just start to work better.

--
-- lk t lv ll th vwls t f wrds. T svs lts f tm t wrt bt ts pn n th ss t rd nd mks m lk lk cmplt dpsht.
Re:Why? by MeatBag+PussRocket · 2009-11-21 11:14 · Score: 4, Funny

redundancy.

--
i wage a holy war against the apostrophe.
Don't do it by Anonymous Coward · 2009-11-21 11:18 · Score: 5, Insightful

Complexity is bad. I work in a department of similar size. Long long ago, things were simple. But then due to plans like yours, we ended up with quadruple replicated dns servers with automatic failover and load balancing, a mail system requiring 12 separate machines (double redundant machines at each of 4 stages: front end, queuing, mail delivery, and mail storage), a web system built from 6 interacting machines (caches, front end, back end, script server, etc.) plus redundancy for load balancing, plus automatic failover. You can guess what this is like: it sucks. The thing was a nightmare to maintain, very expensive, slow (mail traveling over 8 queues to get delivered), and impossible to debug when things go wrong.
It has taken more than a year, but we are slowly converging to a simple solution. 150 people do not need multiply redundant load balanced dns servers. One will do just fine, with a backup in case it fails. 150 people do not need 12+ machines to deliver mail. A small organization doesn't need a cluster to serve web pages.
My advice: go for simplicity. Measure your requirements ahead of time, so you know if you really need load balanced dns servers, etc. In all likelihood, you will find that you don't need nearly the capacity you think you do, and can make due with a much simpler, cheaper, easier to maintain, more robust, and faster setup. If you can call that making due, that is.
What 150 users? by painehope · 2009-11-21 11:26 · Score: 5, Insightful

I'd say that everyone has mentioned that big picture points already, except for one : what kind of users?
150 file clerks or accountants and you'll spend more time worrying about the printer that the CIO's secretary just had to have which conveniently doesn't have reliable drivers or documentation, even if it had what neat feature that she wanted and now can't use.
150 programmers can put a mild to heavy load on your infrastructure, depending on what kind of software they're developing and testing (more a function of what kind of environment are they coding for and how much gear they need to test it).
150 programmers and processors of data (financial, medical, geophysical, whatever) can put an extreme load on your infrastructure. Like to the point where it's easier to ship tape media internationally than fuck around with a stable interoffice file transfer solution (I've seen it as a common practice - "hey, you're going to the XYZ office, we're sending a crate of tapes along with you so you can load it onto their fileservers").
Define your environment, then you know your requirements, find the solutions that meet those requirements, then try to get a PO for it. Have fun.

--
PC moderators can suck my White pierced, tattooed dick. If you think pride == hate, s/dick/Aryan meat mallet/g.
P2V and consolidate by snsh · 2009-11-21 11:26 · Score: 4, Interesting

The low-budget solution: buy one server (like a Poweredge 2970) with like 16GB RAM, a combination of 15k and 7.2k RAID1 arrays, and 4hr support. Install a free hypervisor like Vmware Server or Xen, and P2V your oldest hardware onto it. Later on you can spend $$$$$ on clustering, HA, SANs, and clouds. But P2V of your old hardware onto new hardware is a cost-effective way to start.
Simple and straightforward = complex by sphealey · 2009-11-21 11:41 · Score: 4, Insightful

So let's see if I understand: you want to take a simple, straightforward, easy-to-understand architecture with no single points of failure that would be very easy to recover in the event of a problem and extremely easy to recreate at a different site in a few hours in the event of a disaster, and replace it will a vastly more complex system that uses tons of shiny new buzzwords. All to serve 150 end users for whom you have quantified no complaints related to the architecture other than it might need to be sped up a bit (or perhaps find a GUI interface for the ftp server, etc).
This should turn out well.
sPh
As far as "distributed redundant system", strongly suggested you read Moans Nogood's essay "You Don't Need High Availability" and think very deeply about it before proceeding.
Re:Trying to make your mark, eh? by bertok · 2009-11-21 11:57 · Score: 4, Interesting

I, personally, am TOTALLY in agreement with the ethos of whoever designed it, a single box for each service.
...
Virtualisation is, IMHO, *totally* inappropriate for 99% of cases where it is used, ditto *cloud* computing.
I totally disagree.
Look at some of the services he listed: DNS and DHCP.
You literally can't buy a server these days with less than 2 cores, and getting less than 4 is a challenge. That kind of computing power is overkill for such basic services, so it makes perfect sense to partition a single high-powered box to better utilize it. There is no need to give up redundancy either, you can buy two boxes, and have every key services duplicated between them. Buying two boxes per service on the other hand is insane, especially services like DHCP, which in an environment like that might have to respond to a packet once an hour.
Even the other listed services probably cause negligible load. Most web servers sit there at 0.1% load most of the time, ditto with ftp, which tends to see only sporadic use.
I think you'll find that the exact opposite of your quote is true: for 99% of corporate environments where virtualization is used, it is appropriate. In fact, it's under-used. Most places could save a lot of money by virtualizing more.
I'm guessing you work for an organization where money grows on trees, and you can 'design' whatever the hell you want, and you get the budget for it, no matter how wasteful, right?
Re:Cloud Computing(TM) by digitalchinky · 2009-11-21 16:22 · Score: 4, Insightful

That's a little harsh don't you think?
There are untold numbers of us in this guys position. Asking slashdot is a damn good start at finding a new methodology. Everyone has an opinion, some of them quite intelligent, a few might even work. It's ok for the fortune 500 cube dwellers to jump on the phone and call in a long standing contractor to 'handle it' - the rest of us have to slog through the marketdroid crap and translate the latest buzzword infestations to human speak - then just hope we don't screw it up or waste money.
So far the best suggestions appear to be to figure out how critical things are first (which will shape the hardware requirements), budget second. All the while this is encompassed by the usual core job functions that still need to get done.
So rather than point out the redundant, how about using your fingers to provide a potential solution.
Insurance... by magusnet · 2009-11-21 16:56 · Score: 4, Funny

1) Buy a comprehensive insurance policy
2) Write a detailed implementation plan that you copied from a Google search
3) Wait the 3-6 months the plan calls out before actual "work" begins
4) Burn down the building using a homeless person as the schill
5) Submit an emergency "continuity" plan that you wanted to deploy all along
6) implement the new plan in one third the time of the original plan
7) come in under budget by 38.3%
8) hire a whole new help desk at half the budgeted payroll (52.7% savings)
9) speak at the board meeting: challenges you over came to saving the company
10) Graciously accept the position of CIO
(send all paychecks and bonuses to numbered bank account and retire to a non-extradition country) :)