Testing Network Changes When No Test Labs Exist?
vvaduva writes "The ugly truth is that many network guys secretly work on production equipment all the time, or test things on production networks when they face impossible deadlines. Management often expects us to get a job done but refuse to provide funds for expensive lab equipment, test circuits and for reasonable time to get testing done before moving equipment or configs into production. How do most of you handle such situations, and what recommendation do you have for creating a network test lab on the cheap, especially when core network devices are vendor-centric, like Cisco?"
Whenever you're working in/on a production environment, only one rule matters:
Don't fuck it up.
There are zero replies and the story is already tagged with "youreboned". That's the truth. If your higher ups won't front the money for proper test equipment and expect you to roll out production-ready equipment on the first go, then you really are boned. Of course, you can mitigate this by simple pen-and-paper analysis. What should each piece of equipment do? Are the products we've selected appropriate for the roles we're going to put them in? These sorts of questions can find a lot of bugs without any sort of testing. If you think, "what would I do if it was the 1980's?" then you'll be fine.
The best bet is to be ready to blame the vendor when things go south ;-)
Seriously, I'm right there with you. If management does not want to provide for a test lab & reasonable time to test. Then it's clear they've made a 'business decision' that the network is not of sufficient value / risk is not great enough for such investments.
This may change quickly once something goes south (assuming they understand why it did) but you're gonna be talking to a brick wall until then.
It could be worse, you could have management that are afraid of there own shadows & who freak out at the idea of replacing redundant components after a HW failure. (Ever had to get VP approval to replace a failed GBIC? Oh, I have & yes, I hate my life).
It's perhaps not the best solution, as a lot of problems I've faced since I started getting more into networking stuff than software configuration and web server administration have been related to bad cables rather than bad IOS settings, but virtualization can help you create test situations on the cheep. Specifically, GNS3 allows you to create test networks in a virtual environment, then import software images for your Cisco routers, switches, PIX firewalls, Juniper hardware, etc, all run on hypervisor technology.
:) I'm still pretty new to networking myself, and I use it to make little test labs for myself when I need to do more than I can with the two 3600 and the 2600-series routers I got to take home for experimenting with. I actually copied the IOS images off of them via TFTP and then can replicate them as many times as I need to, but I can claim I have whatever interfaces I need, plus it will (thankfully) simulate the ATM switch for me as well.
You can also use QEMU to create virtual network nodes. If you have enough RAM, then this can help at least get the logical issues worked out and the software configurations square. Then you just need to do the real work
Granted, it's not really an ideal solution, but it may wind up being the only way to avoid using production equipment.
General Relativity: Space-time tells matter where to go; Matter tells space-time what shape to be.
Step 1) Make a formal request for the test lab. Make it as detailed as possible. Explain the impact to business if various components fail. Make a plain-language executive summary calling out risks. step 2) Once the request is denied, make sure you have a paper trail of the rejection step 3) If possible test network changes on the production equipment at 2am so that impact on users will be less step 4) Once the inevitable failure occurs, haul out the paper trail and get the bean counter fired. Repeat until test lab is approved. Note, step 4 may get you fired instead. Business decisions are somewhat nondeterministic.
I call my buddies at RIM and test my mods on their system.
I would suggest asking your vendors for demo or evaluation equipment. Cisco, Juniper and 3Com have pools of demo equipment as do the resellers like PC Connection and CDW.
I've done deployments of new switching infrastructure based on work I've done with loaners from my vendors. It can be tough because the typical evaluation period is 30 days. Although you can get 45 and even 60 days.
If you have a good relationship with your sales rep. It would be easy to push them to get the necessary items to do basic testing and get the concepts down of how you need to deploy. Then get the config files so that when you do buy what you need you're 85% of the way there.
Stretch, over at Packet Life has a great lab set up that anyone who needs to test Cisco configurations on can sign up for and use.
The whole moon and the entire sky are reflected in one dewdrop on the grass. - Dogen
Here are a few tools:
GNS3 - http://www.gns3.net/ - free network simulator, based on Dynamips Cisco emulator
Opnet - http://www.opnet.com/ - detailed planning of networks, from scratch
Traffic Explorer - http://packetdesign.com/ - plan changes to an existing network
Older Cisco equipment can function just as well as newer for 95% of lab scenarios. You are very unlikely to be needing to use all the newer features.
Anything that can run IOS 12.3 and is newer than a decade old can do a lot more than you think. We do all our BGP testing on a stack of 2600s and 3600s and never an issue even though in production its 2800s, 3800s etc.
Granted there are features that you do need the newer kit esp when syntax changes (e.g. IP SLA commands, newer netflow commands, class map based QoS to name three off the top of my head) but none of the core routing and switching features/commands has changed much since the introduction of CEF - they all do ACLs, route maps, OSPF, BGP, EIGRP, vlans, spanning tree, rapid spanning tree, IPSEC vpns. I'm speaking from an enterprise POV not a service provider but I'd imagine if you are in a telco environment you wouldn't be lacking gear.
For many minor test scenarios, you can pick a test branch office and use the good old 'reload in XYZ' command to ensure that no matter how badly you stuff it up, everything will bounce and come back (just remember NOT TO COPY RUN START lol).
Then there's the sleight of hand methods:
- always ordering more for projects than you really need. Par for the course really esp as most project managers haven't a clue when it comes to the nuts and bolts of a big cisco order.
- pushing for EOL replacements as early as possible, intentionally conflate end of sale with end of life.
- getting stuff in for projects as early as possible, then you have a month or two to use it as test gear.
- remember that your lab need not mirror reality, scale down as much as possible. e.g. to simulate a pair of 4506 multilayer switch running in VRRP, use a pair of 3560s. Use your CCO login and flash away to your hearts content (I know its breaching licencing but for test scenarios, meh).
It doesn't save you from doing stupid things; but putting your device configurations under revision control, using something like Rancid can make rolling things back easier, as well as generally encouraging sanity around device configuration.
If you are unable to recycle old equipment into your testlab you should go virtual.
For Cisco routers, GSN3/Dynamips (www.gns3.net) is your friend. Any recent PC or laptop will allow you to build a large and complex topology that will satisfy most experiments and even support you when doing certification preparation. It will only work for routers so switch-based platforms are out (like the 3570,6500 and 7600). The good news is that the features are more or less the same and they more or less behave the same way. If "more or less" is not close enough you need a replica of your production network or at least a few devices of each to test what can be labelled as critical.
For Juniper routers, google juniper Olive. It will run a juniper router the same way dynamips runs a Cisco router.
In both cases a proactive partnership deal with the vendor will be a good idea. Both Cisco and Juniper (and I am sure all other major network vendors) have programs where they will more or less advise, test and prepare the configurations for you. If you run a critical network this is money well spent.
In the end it comes down to the level of risk your management is willing to take. Ask them if they will allow the network to be less up since you are unable to properly test your changes before implementation.
For any sort of medium to large network, you can't fully simulate it. That means you're always going to be making "untested" environment. So, you make very few changes rather than lots, you make sure after each change they've had the desired effect, and you have backout plans.
The Internet's nature is peer to peer - 20050301_cs_profs.pdf
Cisco have many (large) labs located around the world. Sign up for some time in one of them.
Been there, done that (A LOT!!)
But it has failed quite a few times too..
If no money available for test labs, make good plans... Tell the dudes that wanted the changes (or if you are the dude that wants the changes inform the correct people that you will be doing stuff) Agree on a service window. Have backup plans.. Have all configurations saved.. Let all users know that after 10pm on that saturday network will be down for 10 mins etc etc..
Have tons of contengency plans, and let the 'responsible' people known what you are about to do.. Plan everything 'wide'... So even a 5 mins cable plugover, reserve a service window outside of office hours for 2 hours..
You do not mention that this has ever made shit hit a fan. I conclude that so far this has not occured.
Consequently, you have proved that you are able to work without expensive test equipment by a combination and motivation and elbow grease. Congratulations!
Now, what is the logic for someone with a finite pool of money to provide equipment for someone who obviously does not NEED it? Yes, None At All!
You can therefore:
1) Wait until shit hits a fan and say "well, that's what happens when we don't have test equipment". You will then get test equipment OR get fired.
2) Make the shit hit the fan yourself. This is quite difficult to do inconspicuously, so you'll probably get fired and a shit reference.
3) Look around for jobs as well paid as yours but with test equipment.
4) Someone mentioned asking vendors for test equipment - maybe that might work? Note: sales reps have a quota of favours they can call in, so it helps if you have some steady business with them.
Make sure that every change request implementation documents that this change is being placed intro the production environment for testing. Document impact ranging from total network failure to moderate inconvenience and include roll out time tables. The roll out needs include travel times such drive to site B or fly cross country.
Of course the downside of this is that management may go out and hire someone who knows, or at least pretends to know, how to drop changes into place without whining about ignorance and making customers uncomfortable.
It depends a lot on your environment and the complexity you are dealing with. Test labs are wonderful things, but typically you end up in a situation where your network is so limited that a lab won't help much, or your network simply too complex to create a sane lab environment without dedicated staff and a huge budget.
Building a full scale lab is a large undertaking. It takes time and effort. You will need taps (for routing information), traffic generators, topology management and more. In my experience it's usually better to have a smaller testbed that is used to test large changes before deployment and design your network so it's resilient to configuration mistakes.
Getting funding for a limited testbed is also much easier than a full lab, and you can do a lot of testing by simply stuffing a few routers in a rack and connecting it to the network management system. Virtualization is something a lot of people will mention. It's useful, but it's hard to build anything resembling a modern network on top of it. You want hardware that resembles what you use in the network. Sometimes you can scavenge such hardware during upgrades, which can provide you with a basic testbed to build from.
As long as the downtime that will result is acceptable.
1) You should not be making any direct changes to the network with out correct design, test and sign off.
2) You should already have a redundant network structure, so "half" can be loss without any loss to network operations. This way the change can be tested in parallel.
3) You should always report to SOX officer when a request outside correct operations and management is made. It makes it their responsibility to solve the legal issues, for not following their written standards, before you began.
polish your resume.
Download an iso from Vyatta and build a test network with old PCs and spare NICs for testing. Sure, it's not the exact same as Cisco, but if they're too cheap to buy the real thing for a test lab then you'll at least be somewhat close.
Then, once you realize what you're not getting for your money with Cisco, you can buyt $1000 1U servers and build your own routers (or buy them prebuilt from Vyatta for about $2000) to replace the ciscos and make a profit selling the used Ciscos on ebay.
I do NOT work for nor am I affiliated with Vyatta. But their gear is pretty impressive, and open source.
--- It is not the things we do which we regret the most, but the things which we don't do.
The UNH-IOL is a neutral, third-party laboratory dedicated to testing data networking technologies through industry collaboration.
http://www.iol.unh.edu/
Make your objections in writing, email it to the manager demanding the change you believe to place production at risk with the risks clearly outlined in bullet points. if he then insists you proceed, make him send you the request in writing/email and print out a duplicate, keep it in a safe place and then make his change. This way he owns the failure, not you. paper trails exist for a reason, to cover arses, and arse covering is often a worthwhile exercise.
If you mod me down, I will become more powerful than you can imagine....
As you already said, we secretly test on production in such cases.
http://www.clownix.net
I did a write-up on this product in the beginning of this month - can run quagga routers in the UML image of your choice - wrote / ran a 12 router lab that ran on a p4 with 512MB / RAM. (http://www.vlcg.net/content/cloonix-clownix-rocks)
If this product was used - you would only be able to functionally test the protocols in a particular topology - wouldn't be cisco, and it wouldn't be the same as production (different protocols, different topologies).
I discovered this trying to figure out a way to run quagga in a gns3-like setup. GNS3 is great for testing a specific cisco thing that you need to learn about - but it didn't do well for me beyond 3 routers - (too much hand-holding getting the environment tweaked).
My ultimate vision for quagga would be to run it on the hypervisor and let it scale (in numbers of routing instances) wrt to the number of hypervisors - it's a pipe dream for now, but I think that routing that can scale with hypervisors is going to be a big challenge for cisco (esp if they try to do it in silicon) -
--Adrian
Management hates paying for double the equipment, but for any production environment, it should be the cost of doing business. It minimizes risk and provides hot spares faster than an HP (or whatever) tech shows up. You should get some duplicate hardware for staging.
If you can't do that, then refer to the earlier post - don't fsck up.
"No matter where you go, there you are." -- Buckaroo Banzai
router# wr me
.. disconnected from remote host (oops, wait for reload)
router# reload in 30
router# conf t
router(config)# (good luck)
To make matters worse, I'm not even supposed to have the time to test things out on our internal network and the pay is low enough that I can't afford to purchase equipment to test at home on my own time. ...
Honestly, you are better off with a smaller salary if you would spend a raise on the company. The opportunity costs of such an idea are just absurd.
After all, I am strangely colored.
Comment removed based on user account deletion
...You're as guilty - if not MUCH more - than they are here....
Quoting you: "Management often expects us to get a job done but refuse to provide funds for expensive lab equipment"
Well, have you considered it might be that you may not have informed the management from the start what's to be expected in the future? If there is ONE THING that the management does well and knows better than most of us - is how to EARN and KEEP money, they trust YOU to do your job and know everything about it so it doesn't have to be a future headache for them. If you FAILED to INFORM them of your possible needs in the beginning, you have yourself to blame buddy.
You're not alone though, I've been there myself...trying to convince my bosses why I need all that extra gear to keep it safe in the future - when everything has worked FINE so far.
So - be prepared - rather than complaining later.
What this world is coming to - is for you and me to decide.
You hire Professional Services from a lab/test equipment manufacturer (Spirent, Ixia, BPS) or dedicated testing companies (EANTC or others). Most of them will accept to work during the night, so you need to get a "maintenance" window where they can inject traffic. I do that all the time, from the testers side. It's stupid to do, by the way, because you should always test *before* production.
But that's really dangerous and the best way is still to test in the "lab". A lab can be a temporary rack where you put test equipment you rent for a few days. Those test equipments can emulate very complex network topologies, so even if you have only, say, one firewall you need to test, you don't need the rest of the network devices in your lab (although it would be better, of course, but it's not mandatory). Most of the companies have at least one spare unit for their network equipments, to quickly replace them if they were to fail, so you could use that one for testing a new configuration before committing it to production. Again, not ideal, but definitively better than not testing. A nice blog to read about the importance of testing is Spirent's.
I have worked for a few companies that had limited labs, but none that had a comprehensive lab. They would operate in staged upgrades and used emulators as a sanity check, plus a peer review by at least two other engineers. Make sure that there is a management VLan in operation and just shift vc's as needed. A wholesale re-engineering is just asking for it. The key to the whole thing is, ensure you have remote (dialup) access to the routers in question, never save the changes until you are happy, and make sure you keep a good copy on flash in the router. It comes down to your awesome Ninja router skills. This is where a $100K network guy makes his money versus a $35K graduate. EXPERIENCE.
Labs, yeah, good times! The biggest problem is keeping the labs both operational and relevant. I just finished cleaning out my company's network lab as the switchgear was not L3-capable, out of production and out of our network, and none of the interfaces were faster than 100Mbps. None of it could be updated to a relevant OS level. It is mentioned earlier that if you are a large enough network, you designate a branch to serve as a guinea pig for planned changes. Also, if you have a branch close down, make sure you reclaim the equipment if it is new enough and use that for your 'lab' until the next refresh. Sadly, using older equipment only works if you never plan to use leading (bleeding?) edge features. However, my colleagues and I have found that using older equipment sometimes masks new and unknown interactions between the new services and older, perceived-stable, protocols.
Plan ahead meticulously - using paper and pen is not a sin as it is often faster than trying to model your system in software. Also, leverage your vendors heavily. They have the latest gear, and hopefully you will have service contracts, and they can assist you in planning out major changes.
Praying when a change goes in is good, too.
I think, therefore I am - Rene Descartes; I yam what I yam, an' that's what I yam - Popeye
Not a cure-all by any means, but one more trick for the toolbox. Very useful during a maintenance window. Obviously Cisco specific.
(tftp/scp/etc new-config to router)
router# reload in 2
router# copy flash://new-config run
(something along those lines, this is off the top of my head, basically copy your new config to the running config)
if it works, wr it to startup config, if you get disconnected, wait 2 minutes for the router to reboot and automatically load the previous startup-config. Adjust the time as necessary depending on change/complexity.
Also use something like RANCID or KiwiCatTools to help handle managing your configuration changes.
But the best trick of all is using a full blown router emulator like gns3.
It's a MIPS emulator that loads unmodified IOS images. You can build complex scenarios and even attach them to NICs on the host PC. I've built labs with several routers attached to bridged NICs in VMWare guests. So you can literally start, say, a webserver on one vmware guest and access it across your gns3 "network". You can also bridge it to physical NICs -- you could have a 7206vxr router running on an old PC!
Plenty of limitations. Namely it can only run a specific set of IOS images for specific models and you have to use an NM-16ESW to simulate switching since switching is done in ASICs.
It's highly distressing to encounter these people, but many, tech and manager alike, actually think there's nothing wrong with working on production systems. To them that's just how it's done. They know no other way. Trying to educate them is met with blank stares and sometimes even harsh resistance.
seriously, buy a new router to replace a 'broken' one from a location and then somehow fix the broken one for your lab/office.
The truth is that sometimes you not only lack the equipment for lab testing, but also the real world usage scenario. I am often stuck in a situation where I must backup a config and then experiment with production equipment and so am forced to do this outside of business hours. I usually get a chance to do some functional testing offline but cant really put new systems through there paces very well in a lab.
The real key to success here? know what you are doing. You may have to work in less that ideal circumstances but you must be knowledgeable enough to fix a mistake in a reasonable amount of time.
Also consider getting your hands on a rig to do some virtualization. You can virtualize routers and server with something like Xenserver, vmware, or virtualbox. I have done an entire mock deployment of a cisco firewall + windows server 2008 r2 system for remote client access(Windows) and site-to-site vpns(cisco) on a single Xenserver because I can virtualize the cisco router (its slow), windows servers, and even create seperate networks to simulate seperate switches, sites, network segments etc. Q6600+8GB can be had for less than a grand at dell in whitebox.
What I'd suggest is something quiet alien to the tekhead. Get management on your side. Explain the issues talk about the problems. Give them easy to read bullet points. Management will then ask you "Well what do you suggest?" Well you know a lab that effectively mirrors the live environment is about as likely as rocking horse poo but ask about it anyway. If you have concerns they won't fork out the money for it then it's most likely a case that they won't but ask for it and make sure they understand and you discuss it... Assuming you didn't get a lab then talk about the change. Talk them over the mitigation you want to plan in, talk about the rollback, get them on board. Then hit them with a compromise. You know the network better than anyone; work out what equipment you do need to replicate the vast majority of the network. If 90% of your network is based upon say 3 standard models of switches/routers ask for a lab of them. Discuss that you can reduce the risk. Risk is factor you are looking at trying to reduce. You should be able to speak to you management saying. Option 1 cost $50000 99% of network tested Option 2 cost $10000 95% of the network tested Option 3 cost $5000 90% of the network tested The important thing is by getting them in on the dialog and the issue you face the risk assessment and responsibility is being shared between you and management. If things still go south you have some defence against people yelling at you, in fact management will understand the lengths you have gone to to reduce the risk & they will understand that you cannot promise 0% risk on the budget they want and they will have agreed to this...
I have a cunning plan...