Best Practices For Infrastructure Upgrade?
An anonymous reader writes "I was put in charge of an aging IT infrastructure that needs a serious overhaul. Current services include the usual suspects, i.e. www, ftp, email, dns, firewall, DHCP — and some more. In most cases, each service runs on its own hardware, some of them for the last seven years straight. The machines still can (mostly) handle the load that ~150 people in multiple offices put on them, but there's hardly any fallback if any of the services die or an office is disconnected. Now, as the hardware must be replaced, I'd like to buff things up a bit: distributed instances of services (at least one instance per office) and a fallback/load-balancing scheme (either to an instance in another office or a duplicated one within the same). Services running on virtualized servers hosted by a single reasonably-sized machine per office (plus one for testing and a spare) seem to recommend themselves. What's you experience with virtualization of services and implementing fallback/load-balancing schemes? What's Best Practice for an update like this? I'm interested in your success stories and anecdotes, but also pointers and (book) references. Thanks!"
Why virtual servers? If you are going to run multiple services on one machine (and that's fine if it can handle the load) just do it.
Warning: this article may contain humor, sarcasm, parody, and perhaps even irony. Read at your own risk.
No, the budget questions comes later.
The first questions are: What are your businesses requirements regarding your IT infrastructure? How long can you do business without it? How fast does something need to be restored?
Starting with those requirements, you can start with possible designs that fit those solutions - for example, if the requirement is that a machine must be operational at last a week after a crash, you can build computers from random spare parts and hope that they'll work. If the requirement is that it should be up and running in two days, you will need to buy servers from a Tier 1 vendor like HP or IBM with appropriate service contracts. If the requirement is that everything must be up and running again in 4 hours, you'll need backups, clusters, site resilience, replicated SAN, etc. pp.
The question of Budget comes into play much later.
don't touch anything if it's been up and running for the past 7 years. if you really must replicate then get some more cheap boxes and replicate. it's cheaper and faster than virtual anything. if you must. but 150 users doesn't warrant anything in my oppinion. I'd rather invest in backup links (from different companies) between offices. you can bond them for extra throughput.
there's hardly any fallback if any of the services dies or an office is disconnected. Now, as the hardware must be replaced, I'd like to buff things up a bit: distributed instances of services (at least one instance per office) and a fallback/load-balancing scheme (either to an instance in another office or a duplicated one within the same).
Is that really necessary? I know that we all would like to have bullet-proof services. However, is the network service to the various offices so unreliable that it justifies the added complexity of instantiating services at every location? Or even introducing redundancy at each location? If you were talking about thousands or tens of thousands of users at each location, it might make sense just because you would have to distribute the load in some way.
What you need to do is evaluate your connectivity and its reliability. For example:
Once you answer at least those questions, then you have the information you need in order to make a sensible decision.
You know, you could've started with a bit more details - what operating system are you running on the servers? What OS are the clients running? What level of service are you trying to achieve? How many people work in your shop? What's their level of expertise?
If you're asking this on Slashdot now, it means you don't enough experience with this yet - so my first advice would be to get someone involved who does. Someone with many people with lots of experience and knowledge on the platform you work on. This means you'll have backup in case something goes south and your network design will benefit from their experience.
As for other advise, make sure you get the requirements from the higher-ups in writing. Sometimes they have ridiculous ideas regarding they availability they want and how much they're willing to pay for it.
Again, wrong approach. Ask the higher-ups what kind of availability they want. The cost is derived from their wishes.
If you're like most IT managers, you probably have a budget. Which is probably wholly inadequate for immediately and elegantly solving your problems.
Look at your company's business, and how the different offices interact with each other, and with your customers. By just upgrading existing infrastructure, you may be putting some of the money and time where it's not needed, instead of just shutting down a service or migrating it to something more modern or easier to manage. Free is not always better, unless your time has no value.
Pick a few projects to help you get a handle on the things that need more planning, and try and put out any fires as quickly as possible, without committing to a long-term technology plan for remediation.
Your objective is to make the transition as boring as possible for the end users, except for the parts where things just start to work better.
-- lk t lv ll th vwls t f wrds. T svs lts f tm t wrt bt ts pn n th ss t rd nd mks m lk lk cmplt dpsht.
I disagree when you have a budget of 800$ and some shoestrings it eliminates a lot of questions ;)
I am still in the process of upgrading a "legacy" infrastructure in a smaller (less than 50) office but I feel your pain.
First, it's not "tech sexy", but you've got to get the current infrastructure all written down (or typed up - but then you have to burn to cd just in case your "upgrade" breaks everything).
You should also "interview" users (preferrably by email but sometimes if you need an answer you have to just call them or... face to face even...) to find out what services they use - you might be surprised to find something that you didn't even know your Dept was responsible for (oh, that Panasonic PBX that runs the whole phone system is in the locked closet they forgot to tell you about...)
Your next step is prioritizing what you actually need/want to do... remember that you're in a business environment so having redundant power supplies for the dedicated cd burning computer may not actually improve your workplace (but yes, it might be cool to have an automated coffee maker that can run on solar power...)
So now that you know pretty much what you have and what you want to change...
Technology wise, Virtualization is definitely your answer... and there's a learning curve:
VMWare is pretty nice and pretty expensive.
Virtualbox (I use) is free but doesn't have as many enterprise features (automatic failover)
Xen with Remus or HA is the thinking man's setup
All of the above will depend on reliable hardware - that means at least RAID 1, and yes you can go with SAN but be aware that it's a level of complexity you might not need (for FTP, DNS, etc.)
Reading what you've listed as "services" it almost sounds like you want a single linux VM running all of those things with Xen and Remus...
Good luck, and TEST IT before you deploy it as a production setup.
For services running on linux, openVZ can be used as a jail with migration capabilities instead of a full on VM,
DISCLAIMER: I don't have a job so I've read about this but not used it in a pro environment yet
IranAir Flight 655 never forget!
Complexity is bad. I work in a department of similar size. Long long ago, things were simple. But then due to plans like yours, we ended up with quadruple replicated dns servers with automatic failover and load balancing, a mail system requiring 12 separate machines (double redundant machines at each of 4 stages: front end, queuing, mail delivery, and mail storage), a web system built from 6 interacting machines (caches, front end, back end, script server, etc.) plus redundancy for load balancing, plus automatic failover. You can guess what this is like: it sucks. The thing was a nightmare to maintain, very expensive, slow (mail traveling over 8 queues to get delivered), and impossible to debug when things go wrong.
It has taken more than a year, but we are slowly converging to a simple solution. 150 people do not need multiply redundant load balanced dns servers. One will do just fine, with a backup in case it fails. 150 people do not need 12+ machines to deliver mail. A small organization doesn't need a cluster to serve web pages.
My advice: go for simplicity. Measure your requirements ahead of time, so you know if you really need load balanced dns servers, etc. In all likelihood, you will find that you don't need nearly the capacity you think you do, and can make due with a much simpler, cheaper, easier to maintain, more robust, and faster setup. If you can call that making due, that is.
The system you have works solidly, and has worked solidly for seven years.
I, personally, am TOTALLY in agreement with the ethos of whoever designed it, a single box for each service.
Frankly, with the cost of modern hardware, you could triple the capacity of what you have now just by gradually swapping out for newer hardware over the next few months, and keeping the shite old boxen for fallback.
Virtualisation is, IMHO, *totally* inappropriate for 99% of cases where it is used, ditto *cloud* computing.
It sounds to me like you are more interested in making your own mark, than actually taking an objective view. I may of course be wrong, but usually that is the case in stories like this.
In my experience, everyone who tries to make their own mark actually degrades a system, and simply discounts the ways that they have degraded it as being "obsolete" or "no longer applicable"
Frankly, based on your post alone, I'd sack you on the spot, because you sound like the biggest threat to the system to come along in seven years.
These are NOT your computers, if you want a system just so, build it yourself with your own money in your own home.
This advice / opinion is of course worth exactly what it cost.
Apologies in advance if I have misconstrued your approach. (but I doubt that I have)
YMMV.
http://slashdot.org/~GuyFawkes/journal
I'd say that everyone has mentioned that big picture points already, except for one : what kind of users?
150 file clerks or accountants and you'll spend more time worrying about the printer that the CIO's secretary just had to have which conveniently doesn't have reliable drivers or documentation, even if it had what neat feature that she wanted and now can't use.
150 programmers can put a mild to heavy load on your infrastructure, depending on what kind of software they're developing and testing (more a function of what kind of environment are they coding for and how much gear they need to test it).
150 programmers and processors of data (financial, medical, geophysical, whatever) can put an extreme load on your infrastructure. Like to the point where it's easier to ship tape media internationally than fuck around with a stable interoffice file transfer solution (I've seen it as a common practice - "hey, you're going to the XYZ office, we're sending a crate of tapes along with you so you can load it onto their fileservers").
Define your environment, then you know your requirements, find the solutions that meet those requirements, then try to get a PO for it. Have fun.
PC moderators can suck my White pierced, tattooed dick. If you think pride == hate, s/dick/Aryan meat mallet/g.
The low-budget solution: buy one server (like a Poweredge 2970) with like 16GB RAM, a combination of 15k and 7.2k RAID1 arrays, and 4hr support. Install a free hypervisor like Vmware Server or Xen, and P2V your oldest hardware onto it. Later on you can spend $$$$$ on clustering, HA, SANs, and clouds. But P2V of your old hardware onto new hardware is a cost-effective way to start.
Yes, but for example management wanting 24/7 2 hour up&running SLA and having hired a single guy with a budget of 800$ will not work - this is important to get sorted out early. Management needs to know what they want and what they'll get.
So let's see if I understand: you want to take a simple, straightforward, easy-to-understand architecture with no single points of failure that would be very easy to recover in the event of a problem and extremely easy to recreate at a different site in a few hours in the event of a disaster, and replace it will a vastly more complex system that uses tons of shiny new buzzwords. All to serve 150 end users for whom you have quantified no complaints related to the architecture other than it might need to be sped up a bit (or perhaps find a GUI interface for the ftp server, etc).
This should turn out well.
sPh
As far as "distributed redundant system", strongly suggested you read Moans Nogood's essay "You Don't Need High Availability" and think very deeply about it before proceeding.
It is if you recommended outsourcing everything to the cloud.
All intents and purposes. Not intensive purposes.
Except of course that management ALREADY HAS that because they've been very lucky for 7 years. Why spend money for what works (never mind we can't upgrade or replace any of it because it's so old)
I think what the article is really asking is what's a good model to start all this stuff. Your looking at one or two servers per location (or maybe even network appliances at remote sites) We read all this stuff on Slashdot and in the deluges of magazines and marketing material...where do we start to make it GO?
If the current system has been acceptable for 7 years, I'm guessing the users needs aren't something so mindbogglingly critical that risk must be removed at any cost. Equally, if that was the case, the business would be either bringing in an experienced team or writing a blank cheque to an external party, not giving it to the guy who changes passwords and has spent the last week putting together a jigsaw of every enterprise option out there, and getting an "n+1" tattoo inside his eyelids.
Finally, 7 years isn't exactly old. We've got a subsidiary company of just that size (150 users, 10 branches) running on Proliant 1600/2500/5500 gear (ie 90's) which we consider capable for the job, which includes Oracle 8, Citrix MF plus a dozen or so more apps and users on current hardware. We have the occasional hardware fault which a maintenance provider can address same day, bill us at ad-hoc rates yet we still see only a couple of thousand dollars a year in maintenance leaving us content that this old junk is still appropriate no matter which we we look at it.
Only big ligs use sigs.
Why would you buy a cluster not the same architecture? You don't know what you're talking about. VMs generally aren't used to change architecture like that. In a Virtualized Cluster the "OS" is just another data file too! Just point an available CPU to your file server image on the SAN and start it back up... that's smart, not lazy!
Most people need virtualization because managing crappy old apps on old server OSes is a bitch. The old busted apps are doing mission critical work, customized to the point the manufacture won't support them and management doesn't want to pay out for the new version... or the new version doesn't support the old equipment. The leading purpose for VMs is to get new shiny hardware with a modern OS and backup methods to segregate your old hard to maintain configurations to instances. Then the old and busted doesn't crash the core services anymore. Instances that used to be on dedicated, busted hardware that used to require a call-out can be rebooted from your couch in your jammies! (I vote VNC on iPhone as thee killer admin app!) VMs include backup at the VM level, so those old machines that refused to support backup can be backed up "in spite of" the software trying to prevent it.
I think what the article is really asking is what's a good model to start all this stuff. You're looking at one or two servers per location (or maybe even network appliances at remote sites).
I totally agree with your premise. In my experience taking something that appears to work (when you realize you've really just been lucky) requires some time to bring about the change that the business really needs.
Now, as for having two servers per location, that heavily depends on how those sites are connected. Are they using a dedicated line or a VPN? That's important since that'll affect what hardware needs to be located where. It's possible (even if unlikely) that some sites would only need a VPN appliance... But since the poster seems to want general advice:
VMWare ESXi is a pretty good starting place for getting going on virtualization. I've had a great experience with it for testing. When you feel like you've got a good handle, get the ESX licenses.
If SAN isn't in your budget, I still recommend some sort of external storage for the critical stuff... Preferably replicated to another site... But you can run the OS on local storage, especially in the early stages. But you'll need to get everything onto external storage to implement the VMotion services and instant failover. Get a good feel for P2V conversion. It'll save you tons of time when it works... It doesn't always, but that's why you'll always test, test and test.
As for the basic services you stated above (www, ftp, email, dns, firewall, dhcp):
Firewall (IMHO) is best done on appliance. Which should be anywhere you have an internet connection coming in. I'm sure you knew that already, but I'm trying to be thorough.
Email is usually going to be on its own instance (guest, cluster, whatever)... But I find that including it in the virtualization strategy has been quite alright. In fact, my experience with virtualization has been quite good except when there is a specific hardware requirement for an application (a custom card, or something like that). USB has been much less of a headcache since VMWare has support for it now, but there are also network based USB adapters (example: USBAnywhere) that provide a port for guest OSes in case you don't use VMWare.
The question is not about hardware or configuration. It is about best practices. This is a higher level process question. Not an implementation question.
putting the 'B' in LGBTQ+
Here's how we do it:
- Run your services in a few vservers on the same physical server:
* DNS + DHCP
* mail
* ftp
* www
- Have a backup server where your stuff is rsynced daily. This allows for quick restores in case of disaster.
Vservers are great because they isolate you from the hardware. Server becomes too small? Buy another one, move your vservers to it and you're done. Need to upgrade a service? Copy the vserver, upgrade, test, swap it with the old one when you are set. It's a great advantage to be able to move stuff easily from one box to another.
If MS is going to astroturf, you need to at least learn to be a bit more subtle about it. That post couldn't have been more obviously marketing drivel if it tried. Regardless of technical merit of the solution (which I can't discuss authoritatively).
The post history of the poster is even more amusingly obvious. No normal person is a shill for one specific cause in every single point of every post they ever make.
To all companies: please keep your advertising in the designated ad locations and pay for them, don't post marketing material posing as just another user.
XML is like violence. If it doesn't solve the problem, use more.
It's not a super config, and a lot of people will argue that it's not a true setup, but it's sufficient for our needs. I think we hit 4% CPU utilization across all the nodes the other day.
With VMWare, watch the 2TB filesystem limit. We ran in to that with our SATA array. Basically you have to slice it in to 2TB chunks to get VMware to accept it as a datastore.
As far as networking goes, we have a couple of gigE switches running the traffic. Our SANs are redundant, as we clone all of the machines from our SAS "SAN" to our SATA. If the "production" SAN goes down we can start up the clone from the SATA box in minutes. After the primary SAN comes back up we can VMotion it across to the other data store.
"The tension between budget and business requirements can be useful but it is largely a paper tiger."
Yes indeed, but not because of the reasons you highlight. There is no tension between budget and requirements since budget is just a natural outcoming from the requirements themselves: you don't need 24x7 services; you lose XXX dolars per hour when the service is down. Once you factor in the risk management is wishing to take your budget is just a matter of a multiply: it's XXX dolars per downtime hour multiplied by the risk you are accepting. You lose 10.000 per downtime hour and you don't want to lose more than 100.000 on a risk you measured to have a 10% chance (a ten hours downtime)? Then your allowed front cost for this is 30.000 (for iron under three years amortization).
I'm used to hear about "I want uber-redundancy and 24x7 disponibility" "well, that'll cost you XXX" "But I can't pay that!" That means that you don't earn that much from that system. It's never "I can't afford it" but "it doesn't get me so much".
That's a little harsh don't you think?
There are untold numbers of us in this guys position. Asking slashdot is a damn good start at finding a new methodology. Everyone has an opinion, some of them quite intelligent, a few might even work. It's ok for the fortune 500 cube dwellers to jump on the phone and call in a long standing contractor to 'handle it' - the rest of us have to slog through the marketdroid crap and translate the latest buzzword infestations to human speak - then just hope we don't screw it up or waste money.
So far the best suggestions appear to be to figure out how critical things are first (which will shape the hardware requirements), budget second. All the while this is encompassed by the usual core job functions that still need to get done.
So rather than point out the redundant, how about using your fingers to provide a potential solution.
1) Buy a comprehensive insurance policy
2) Write a detailed implementation plan that you copied from a Google search
3) Wait the 3-6 months the plan calls out before actual "work" begins
4) Burn down the building using a homeless person as the schill
5) Submit an emergency "continuity" plan that you wanted to deploy all along
6) implement the new plan in one third the time of the original plan
7) come in under budget by 38.3%
8) hire a whole new help desk at half the budgeted payroll (52.7% savings)
9) speak at the board meeting: challenges you over came to saving the company
10) Graciously accept the position of CIO
(send all paychecks and bonuses to numbered bank account and retire to a non-extradition country) :)
Except of course that management ALREADY HAS that because they've been very lucky for 7 years
Whoa there - so using this logic we can assume the company has no fire insurance, etc, because they've been lucky and not had their building burn down in 7 years? Managers might not understand technical issue but one thing managers worth the title CAN do is manage risk ie: balance cost of risk mitigation against risk. I can well imagine a company of 150 people that actually doesn't have any mission critical servers worth spending a lot on redundancy, etc. I can also imagine a company that has gotten lucky while at the same time, the IT person(s) haven't explained IT risks/costs in proper terms because they assume the managers just aren't technical.
The original questioner definitely needs to do a proper risk / cost analysis and present it to the managers. (But right now his "ideas" are WAY too vague and not business need driven) A prompt, proper analysis and plan/alternate plan(s) for risk and risk avoidance is going seriously wanted. It will CYA for that magic moment any day now when these 7 year old systems start failing.
Any server that can offer a RAID disk solution would be fine. Blade servers seems to be an overkill for most solutions - and they are expensive.
And then run DFS (Distributed File System) or similar to have replication between sites for the data. This will make things easier. And if you have a well working replication you can have the backup system located at the head office and don't have to worry about running around swapping tapes at the local branch offices.
Some companies tends to centralize email around a central mail server. This has it's pros and cons. The disadvantage is that if the head office goes down everyone is without email service. But the configuration can be more complicated if each branch office has it's own.
It's also hard to tell how to best stitch together a solution for a specific case without knowing how the company in question works. There is no golden solution that works for all companies.
The general idea is however that DNS and DHCP shall be local. If they aren't then the local office will be dead as a dodo as soon as there is a glitch in the net. Anyone not providing local DNS and DHCP should be brought out of the organization as soon as possible. And DNS and DHCP doesn't require much maintenance either, so they won't put much workload on the system administration.
There are companies (big ones) that run central DHCP and DNS, but glitches can cause all kind of trouble - like providing the same IP address to a machine in Holland and in Sweden simultaneously (yes - it has happened in reality, no joke) - and the work required to figure out what's wrong when multiple sites are involved in an IP address conflict can cost a lot. And if you run Windows you should have roaming profiles configured and a local server on each site where the profiles are stored.
Local WWW and FTP servers - can work, but watch out too since you have to check out if it's for internal or external use. Do you really need a local WWW and FTP server for each site? I would say - no. And those servers should be on a DMZ. It can of course be one server servicing both WWW and FTP. The big issue with especially FTP servers if they are for dedicated external users is the maintenance of the accounts on those servers. Obsolete FTP server accounts are a security risk.
And if you run Windows I would really suggest that you do set up WDS (Windows Deployment Server). This will allow your PC clients to do a network boot and reinstall them from an image. Saves a lot of time and headache.
And today many users have laptop computers, so hard disk encryption should be considered to limit the risk of having business critical data going into the wrong hands. Truecrypt is one alternative that I have found that works really well. But don't run it on the servers.
If builders built buildings the way programmers wrote programs, then the first woodpecker would destroy civilization.
Well, it's not like i've not got any clashes with out management (or even that of some of our customers), but i've found it to be the better approach to actually talk things through in the hope of getting a better understanding of both parties.
From my technical standpoint, it's very much important that management _exactly_ knows what they're getting for their money. This also means saying "No". Yes, you can lose customers if you don't promise them 99.999999999% availability for 50$ a month, but the real question if you actually wanted those customers in the first place - they may find some idiot which agrees to work with them, but that's their loss.
Even internally, as we also run our internal infrastructure, it's important to say "no" to unreasonable tasks and stupid ideas. Either management trusts you to actually do the job they've hired you to do, or they don't - then you'll need to find a new job.
mod parent up.
The first step is to find out what the business wants, and how much it is willing to pay. THEN you go out to find out what tech is appropriate/affordable to do it.
Ask the heads of each office, and the main business managers what they want the tech to do now, in a year and in three years. Do you have a business continuity plan that has to be allowed for. If you don't have a BC plan, now's a good time to have one done, before you buy a load of kit that may not do the job.
Once you have a list of business needs, and put them in a prioritised list (again the managers set the priority), you go out and look at what can do the job. Assuming you find a reasonable solution within budget, you need to plan the migration.
Protip: do not attempt to migrate everything in one go. Do it in steps, with breaks in between.
Proprotip: whatever your migration, be able to revert to the original solution in less than 8 hours - ie one working day.
Migration is the biggest gotcha - plan, plan and plan again. Do a dry run. Start with the least critical services. You do have backups, right? Fully tested backups, from ground zero? You do have all your network and infrastructure accurately and completely mapped out, and all configuration settings / files stored on paper and independent machines?
Both arguments for VM and KISS have their place - only you can decide. But when you do decide, make sure it's based on evidence, and will end up making the business better.
Don't forget Total Cost of Ownership - the shiny boxes may run faster, but will you have to hire two more techs to keep them running, or a new maintenance contract?
Don't forget training - for you, your staff and the end users. If you're putting shiney newness in place, people will need to know how to use it, and do their jobs at least as quickly as on the old solution. No use putting in shiny web4.0 uber cloud goodness, if the users end up spending an hour doing a job that used to take 5 minutes, because they don't know how to use it properly, or the interface doesn't easily work with their business processes.
good luck
I have vmware machines on one server at home. There are still benefits even though it's not a cluster. So it's not that stupid.
It is easier to move the virtual servers to another machine or O/S. This is useful when upgrading or when hardware fails or when growing (move from one real server to two or more real servers). There's no need to reinstall stuff because the drivers are different etc.
You can snapshot virtual machines and then back them up while they are running. Backup and restore is not that hard that way. So even if you have a single point of failure, if you have recent image back ups, you could buy a machine with preinstalled O/S, install vmware, and get back up and running rather quickly.
And when power fails and the UPS runs low on battery, I have a script that suspends all virtual machines then powers the server down. That's more convenient too than setting up lots of UPS agents on multiple machines and hoping they all shutdown in time.
DB performance sucks in a vmware guest though, so where DB/IO performance is important, use "real" stuff. Things may be better with other virtualization tech/software.
In my uninformed opinion, blades are mainly a way for hardware vendors to extract more money from suckers.
They probably have niche uses. But when you get to the details they're not so great. Yes the HP iLO stuff is cool etc... When it works.
Many of the HP blades don't come with optical drives. You have to mount CD/DVD images via the blade software. Which seemed to only work reliably on IE6 on XP. OK so maybe we should have tried it with more browsers, than IE8, but who has time? Especially see below why you don't have time:
So far I haven't seen any mention in HP documentation that the transfer rate of the mounted CD/DVD image (or folder) between your laptop to the iLO software to a blade that you're trying to install stuff on is a measly 500 kilobytes per second. But that's what we encountered in practice.
Yes you can attach the blade network to another network and install it over the network, but if you can do that, doesn't that make the fancy HP iLO stuff less important? You might as well just get a network KVM right? That KVM will work with Dell/IBM/WhiteBoxServer so you can tell HP to fuck off and die if you want.
Which brings us to the next important point: Fancy Vendor X enclosures will only work with current and near future Vendor X blades. In 3-5 years time they might start charging you a lot more to buy new but obsolete Vendor X blades. Whoopee. What are the odds you can use the latest blades in your old enclosure? So you pay a premium for vendor lock-in and to be screwed in the future.
I doubt Google, etc use blades. And they seem to be able to manage hundreds of thousands of servers. OK so most of the servers might be running the same image/thing... So that makes it easy.
BUT if you are having very different servers do you really want them in a few blade enclosures? Then when you need to service that enclosure you'd be bringing down all the different blades...
The biggest problem I've found with blades is that you can't fill a rack with them. Several of the datacenters I've come across have been unable to fit more than one bladecenter per rack. Cooling and power being the problem.
At the moment. A rack full of 1U boxes look like the highest density to me.
Deleted