Testing Network Changes When No Test Labs Exist?
vvaduva writes "The ugly truth is that many network guys secretly work on production equipment all the time, or test things on production networks when they face impossible deadlines. Management often expects us to get a job done but refuse to provide funds for expensive lab equipment, test circuits and for reasonable time to get testing done before moving equipment or configs into production. How do most of you handle such situations, and what recommendation do you have for creating a network test lab on the cheap, especially when core network devices are vendor-centric, like Cisco?"
Whenever you're working in/on a production environment, only one rule matters:
Don't fuck it up.
There are zero replies and the story is already tagged with "youreboned". That's the truth. If your higher ups won't front the money for proper test equipment and expect you to roll out production-ready equipment on the first go, then you really are boned. Of course, you can mitigate this by simple pen-and-paper analysis. What should each piece of equipment do? Are the products we've selected appropriate for the roles we're going to put them in? These sorts of questions can find a lot of bugs without any sort of testing. If you think, "what would I do if it was the 1980's?" then you'll be fine.
The best bet is to be ready to blame the vendor when things go south ;-)
Seriously, I'm right there with you. If management does not want to provide for a test lab & reasonable time to test. Then it's clear they've made a 'business decision' that the network is not of sufficient value / risk is not great enough for such investments.
This may change quickly once something goes south (assuming they understand why it did) but you're gonna be talking to a brick wall until then.
It could be worse, you could have management that are afraid of there own shadows & who freak out at the idea of replacing redundant components after a HW failure. (Ever had to get VP approval to replace a failed GBIC? Oh, I have & yes, I hate my life).
It's perhaps not the best solution, as a lot of problems I've faced since I started getting more into networking stuff than software configuration and web server administration have been related to bad cables rather than bad IOS settings, but virtualization can help you create test situations on the cheep. Specifically, GNS3 allows you to create test networks in a virtual environment, then import software images for your Cisco routers, switches, PIX firewalls, Juniper hardware, etc, all run on hypervisor technology.
:) I'm still pretty new to networking myself, and I use it to make little test labs for myself when I need to do more than I can with the two 3600 and the 2600-series routers I got to take home for experimenting with. I actually copied the IOS images off of them via TFTP and then can replicate them as many times as I need to, but I can claim I have whatever interfaces I need, plus it will (thankfully) simulate the ATM switch for me as well.
You can also use QEMU to create virtual network nodes. If you have enough RAM, then this can help at least get the logical issues worked out and the software configurations square. Then you just need to do the real work
Granted, it's not really an ideal solution, but it may wind up being the only way to avoid using production equipment.
General Relativity: Space-time tells matter where to go; Matter tells space-time what shape to be.
Step 1) Make a formal request for the test lab. Make it as detailed as possible. Explain the impact to business if various components fail. Make a plain-language executive summary calling out risks. step 2) Once the request is denied, make sure you have a paper trail of the rejection step 3) If possible test network changes on the production equipment at 2am so that impact on users will be less step 4) Once the inevitable failure occurs, haul out the paper trail and get the bean counter fired. Repeat until test lab is approved. Note, step 4 may get you fired instead. Business decisions are somewhat nondeterministic.
I call my buddies at RIM and test my mods on their system.
Seriously, try and find as much virtual equipment as you can and replicate it as closely as possible to your production lab. If you run one of the myriad sniffers on a VM, you might even come up with a clever way to send production traffic to your virtual lab. There is no other way to do it. You are screwed, so if you're serious, you can either buy the lab yourself or make one out of tin cans, coconuts and wet rope.
I would suggest asking your vendors for demo or evaluation equipment. Cisco, Juniper and 3Com have pools of demo equipment as do the resellers like PC Connection and CDW.
I've done deployments of new switching infrastructure based on work I've done with loaners from my vendors. It can be tough because the typical evaluation period is 30 days. Although you can get 45 and even 60 days.
If you have a good relationship with your sales rep. It would be easy to push them to get the necessary items to do basic testing and get the concepts down of how you need to deploy. Then get the config files so that when you do buy what you need you're 85% of the way there.
One problem with the situation you are in, is that you've got a work-around that has sufficed so far. So, you might WANT a test lab, but clearly you don't NEED one... because hey, if you needed it you couldn't have got all this production stuff working, right? The only way this changes is when you've got multiple teams dealing with a production outage that takes a long time and costs a lot of money because you have to do some trial-and-error fixes to isolate the problem. Only THEN will you get your test lab, after an appropriate amount of paperwork and delay. The trick is doing this without the outage being perceived as your fault.
Stretch, over at Packet Life has a great lab set up that anyone who needs to test Cisco configurations on can sign up for and use.
The whole moon and the entire sky are reflected in one dewdrop on the grass. - Dogen
Here are a few tools:
GNS3 - http://www.gns3.net/ - free network simulator, based on Dynamips Cisco emulator
Opnet - http://www.opnet.com/ - detailed planning of networks, from scratch
Traffic Explorer - http://packetdesign.com/ - plan changes to an existing network
Older Cisco equipment can function just as well as newer for 95% of lab scenarios. You are very unlikely to be needing to use all the newer features.
Anything that can run IOS 12.3 and is newer than a decade old can do a lot more than you think. We do all our BGP testing on a stack of 2600s and 3600s and never an issue even though in production its 2800s, 3800s etc.
Granted there are features that you do need the newer kit esp when syntax changes (e.g. IP SLA commands, newer netflow commands, class map based QoS to name three off the top of my head) but none of the core routing and switching features/commands has changed much since the introduction of CEF - they all do ACLs, route maps, OSPF, BGP, EIGRP, vlans, spanning tree, rapid spanning tree, IPSEC vpns. I'm speaking from an enterprise POV not a service provider but I'd imagine if you are in a telco environment you wouldn't be lacking gear.
For many minor test scenarios, you can pick a test branch office and use the good old 'reload in XYZ' command to ensure that no matter how badly you stuff it up, everything will bounce and come back (just remember NOT TO COPY RUN START lol).
Then there's the sleight of hand methods:
- always ordering more for projects than you really need. Par for the course really esp as most project managers haven't a clue when it comes to the nuts and bolts of a big cisco order.
- pushing for EOL replacements as early as possible, intentionally conflate end of sale with end of life.
- getting stuff in for projects as early as possible, then you have a month or two to use it as test gear.
- remember that your lab need not mirror reality, scale down as much as possible. e.g. to simulate a pair of 4506 multilayer switch running in VRRP, use a pair of 3560s. Use your CCO login and flash away to your hearts content (I know its breaching licencing but for test scenarios, meh).
It doesn't save you from doing stupid things; but putting your device configurations under revision control, using something like Rancid can make rolling things back easier, as well as generally encouraging sanity around device configuration.
If you are unable to recycle old equipment into your testlab you should go virtual.
For Cisco routers, GSN3/Dynamips (www.gns3.net) is your friend. Any recent PC or laptop will allow you to build a large and complex topology that will satisfy most experiments and even support you when doing certification preparation. It will only work for routers so switch-based platforms are out (like the 3570,6500 and 7600). The good news is that the features are more or less the same and they more or less behave the same way. If "more or less" is not close enough you need a replica of your production network or at least a few devices of each to test what can be labelled as critical.
For Juniper routers, google juniper Olive. It will run a juniper router the same way dynamips runs a Cisco router.
In both cases a proactive partnership deal with the vendor will be a good idea. Both Cisco and Juniper (and I am sure all other major network vendors) have programs where they will more or less advise, test and prepare the configurations for you. If you run a critical network this is money well spent.
In the end it comes down to the level of risk your management is willing to take. Ask them if they will allow the network to be less up since you are unable to properly test your changes before implementation.
For any sort of medium to large network, you can't fully simulate it. That means you're always going to be making "untested" environment. So, you make very few changes rather than lots, you make sure after each change they've had the desired effect, and you have backout plans.
The Internet's nature is peer to peer - 20050301_cs_profs.pdf
but if you write a proposal and show the benefits of having the right equipment and the operational costs of not having the right equipment, you might be able to get a spirent testcenter. Do a demo with some linux/*bsd boxes running iperf, but remind them of the features and abilities you will get with quality network testing tools.
Cisco have many (large) labs located around the world. Sign up for some time in one of them.
Been there, done that (A LOT!!)
But it has failed quite a few times too..
If no money available for test labs, make good plans... Tell the dudes that wanted the changes (or if you are the dude that wants the changes inform the correct people that you will be doing stuff) Agree on a service window. Have backup plans.. Have all configurations saved.. Let all users know that after 10pm on that saturday network will be down for 10 mins etc etc..
Have tons of contengency plans, and let the 'responsible' people known what you are about to do.. Plan everything 'wide'... So even a 5 mins cable plugover, reserve a service window outside of office hours for 2 hours..
You do not mention that this has ever made shit hit a fan. I conclude that so far this has not occured.
Consequently, you have proved that you are able to work without expensive test equipment by a combination and motivation and elbow grease. Congratulations!
Now, what is the logic for someone with a finite pool of money to provide equipment for someone who obviously does not NEED it? Yes, None At All!
You can therefore:
1) Wait until shit hits a fan and say "well, that's what happens when we don't have test equipment". You will then get test equipment OR get fired.
2) Make the shit hit the fan yourself. This is quite difficult to do inconspicuously, so you'll probably get fired and a shit reference.
3) Look around for jobs as well paid as yours but with test equipment.
4) Someone mentioned asking vendors for test equipment - maybe that might work? Note: sales reps have a quota of favours they can call in, so it helps if you have some steady business with them.
That is what simulation/network planning software is for. For example OPNET: http://www.opnet.com
reload in 5
I'm dead serious. If you are on production stuff and you screw it up remotely, you can at least tell it to reload and pull it's old config. You have some downtime, but it's better than the downtime you'd experience if you had to drive out there.
Make sure that every change request implementation documents that this change is being placed intro the production environment for testing. Document impact ranging from total network failure to moderate inconvenience and include roll out time tables. The roll out needs include travel times such drive to site B or fly cross country.
Of course the downside of this is that management may go out and hire someone who knows, or at least pretends to know, how to drop changes into place without whining about ignorance and making customers uncomfortable.
It depends a lot on your environment and the complexity you are dealing with. Test labs are wonderful things, but typically you end up in a situation where your network is so limited that a lab won't help much, or your network simply too complex to create a sane lab environment without dedicated staff and a huge budget.
Building a full scale lab is a large undertaking. It takes time and effort. You will need taps (for routing information), traffic generators, topology management and more. In my experience it's usually better to have a smaller testbed that is used to test large changes before deployment and design your network so it's resilient to configuration mistakes.
Getting funding for a limited testbed is also much easier than a full lab, and you can do a lot of testing by simply stuffing a few routers in a rack and connecting it to the network management system. Virtualization is something a lot of people will mention. It's useful, but it's hard to build anything resembling a modern network on top of it. You want hardware that resembles what you use in the network. Sometimes you can scavenge such hardware during upgrades, which can provide you with a basic testbed to build from.
When it happens, point out (on paper!) to yr mgmt chain how it cd have been prevented with a decent test configuration in place.
As long as the downtime that will result is acceptable.
1) You should not be making any direct changes to the network with out correct design, test and sign off.
2) You should already have a redundant network structure, so "half" can be loss without any loss to network operations. This way the change can be tested in parallel.
3) You should always report to SOX officer when a request outside correct operations and management is made. It makes it their responsibility to solve the legal issues, for not following their written standards, before you began.
polish your resume.
Download an iso from Vyatta and build a test network with old PCs and spare NICs for testing. Sure, it's not the exact same as Cisco, but if they're too cheap to buy the real thing for a test lab then you'll at least be somewhat close.
Then, once you realize what you're not getting for your money with Cisco, you can buyt $1000 1U servers and build your own routers (or buy them prebuilt from Vyatta for about $2000) to replace the ciscos and make a profit selling the used Ciscos on ebay.
I do NOT work for nor am I affiliated with Vyatta. But their gear is pretty impressive, and open source.
--- It is not the things we do which we regret the most, but the things which we don't do.
The UNH-IOL is a neutral, third-party laboratory dedicated to testing data networking technologies through industry collaboration.
http://www.iol.unh.edu/
Make your objections in writing, email it to the manager demanding the change you believe to place production at risk with the risks clearly outlined in bullet points. if he then insists you proceed, make him send you the request in writing/email and print out a duplicate, keep it in a safe place and then make his change. This way he owns the failure, not you. paper trails exist for a reason, to cover arses, and arse covering is often a worthwhile exercise.
If you mod me down, I will become more powerful than you can imagine....
As you already said, we secretly test on production in such cases.
It's only a matter of time until a change that wasn't properly tested completely screws everything up and some exec is lookin at you for answers. I've learned that the best interpersonal skill to have is deflection. Nice guys finish last, especially in a corporate environment, so try to get test equipment and when they say no, like all companies do, SAVE IT so you can blame someone else! This is what you can send to the CTO when he asks why you didn't properly test the changes that caused the company to lose millions of dollars in operating costs cause the network was down for 6 hours. "well, I warned people in this email trail and proposal, but they shot me down, and I was right". If by some incredible miracle this never has to happen, then count your lucky stars and when they ask why nothing has gone wrong, toot your own horn and say that it's because you are so damn good. No matter what, you show value, you secure your position. As for basic testing, any of the programs mentioned here will work, Packet Tracer is limited in the models it supports so you might want to look at something else first.
http://www.clownix.net
I did a write-up on this product in the beginning of this month - can run quagga routers in the UML image of your choice - wrote / ran a 12 router lab that ran on a p4 with 512MB / RAM. (http://www.vlcg.net/content/cloonix-clownix-rocks)
If this product was used - you would only be able to functionally test the protocols in a particular topology - wouldn't be cisco, and it wouldn't be the same as production (different protocols, different topologies).
I discovered this trying to figure out a way to run quagga in a gns3-like setup. GNS3 is great for testing a specific cisco thing that you need to learn about - but it didn't do well for me beyond 3 routers - (too much hand-holding getting the environment tweaked).
My ultimate vision for quagga would be to run it on the hypervisor and let it scale (in numbers of routing instances) wrt to the number of hypervisors - it's a pipe dream for now, but I think that routing that can scale with hypervisors is going to be a big challenge for cisco (esp if they try to do it in silicon) -
--Adrian
Management hates paying for double the equipment, but for any production environment, it should be the cost of doing business. It minimizes risk and provides hot spares faster than an HP (or whatever) tech shows up. You should get some duplicate hardware for staging.
If you can't do that, then refer to the earlier post - don't fsck up.
"No matter where you go, there you are." -- Buckaroo Banzai
I work in a small IT house that provides network support for quite a few customers that are not large enough to have their own IT people.
We're very Windows centric (yeah, I know, boo) and have no budget for any test equipment/training, yet am expected to be up to date on changes in Windows.To make matters worse, I'm not even supposed to have the time to test things out on our internal network and the pay is low enough that I can't afford to purchase equipment to test at home on my own time.
So, what works (kinda) for me has been to keep an eye out for equipment that has been abandoned by our sales team, (usually through extensive hardware problems that causes a customer to decide that it's not reliable enough for their network), and/or take equipment off of the sales shelf for testing. For the software/knowledge side, I will quite belligerently tell my boss to go away when I'm testing something that needs to get tested. This requires that you have a certain amount of clout and/or your boss is afraid of you quitting enough to let you get away with it.
On the customer/end user side, develop some sort of personal relationship with them, whether that be going out for drinks with them periodically, knowing what they do for fun and/or have them know what you do for fun (no, gaming doesn't cut it). Be up front with them when something does mess up (literally saying that you didn't realize that what you were doing might have that problem).
Never, ever blame someone else unless you're sure it's their fault, take the blame yourself-this'll save your ass when it really isn't your fault and someone tries to pin it on you.
---
Having said all of that, what you (and I) should really be doing is looking for a new job.
router# wr me
.. disconnected from remote host (oops, wait for reload)
router# reload in 30
router# conf t
router(config)# (good luck)
To make matters worse, I'm not even supposed to have the time to test things out on our internal network and the pay is low enough that I can't afford to purchase equipment to test at home on my own time. ...
Honestly, you are better off with a smaller salary if you would spend a raise on the company. The opportunity costs of such an idea are just absurd.
After all, I am strangely colored.
Comment removed based on user account deletion
...You're as guilty - if not MUCH more - than they are here....
Quoting you: "Management often expects us to get a job done but refuse to provide funds for expensive lab equipment"
Well, have you considered it might be that you may not have informed the management from the start what's to be expected in the future? If there is ONE THING that the management does well and knows better than most of us - is how to EARN and KEEP money, they trust YOU to do your job and know everything about it so it doesn't have to be a future headache for them. If you FAILED to INFORM them of your possible needs in the beginning, you have yourself to blame buddy.
You're not alone though, I've been there myself...trying to convince my bosses why I need all that extra gear to keep it safe in the future - when everything has worked FINE so far.
So - be prepared - rather than complaining later.
What this world is coming to - is for you and me to decide.
Juniper routers have much more powerful management interface than Cisco. They have built in configuration versioning, atomic changesets, allow easy rollback and can schedule commits. Also there is programmatic configuration API. If you have no test lab, Junipers can help you a lot.
We actually try to put non-downtime incurring changes though during business hours, this way any unexpected issues will come up immediately and we can react. Rather than This is in a seizable high end production environment.
Our company purchases DR equipment for our network hardware. Thus if a switch in the data center blows out we can replace it very quickly.
Instead of leaving it on the shelf we hook it up and use it for test environments. If production needs a replacement we drop the test environment and put the DR in place.
Costs more upfront but makes good use out of the equipment.
You hire Professional Services from a lab/test equipment manufacturer (Spirent, Ixia, BPS) or dedicated testing companies (EANTC or others). Most of them will accept to work during the night, so you need to get a "maintenance" window where they can inject traffic. I do that all the time, from the testers side. It's stupid to do, by the way, because you should always test *before* production.
But that's really dangerous and the best way is still to test in the "lab". A lab can be a temporary rack where you put test equipment you rent for a few days. Those test equipments can emulate very complex network topologies, so even if you have only, say, one firewall you need to test, you don't need the rest of the network devices in your lab (although it would be better, of course, but it's not mandatory). Most of the companies have at least one spare unit for their network equipments, to quickly replace them if they were to fail, so you could use that one for testing a new configuration before committing it to production. Again, not ideal, but definitively better than not testing. A nice blog to read about the importance of testing is Spirent's.
I have worked for a few companies that had limited labs, but none that had a comprehensive lab. They would operate in staged upgrades and used emulators as a sanity check, plus a peer review by at least two other engineers. Make sure that there is a management VLan in operation and just shift vc's as needed. A wholesale re-engineering is just asking for it. The key to the whole thing is, ensure you have remote (dialup) access to the routers in question, never save the changes until you are happy, and make sure you keep a good copy on flash in the router. It comes down to your awesome Ninja router skills. This is where a $100K network guy makes his money versus a $35K graduate. EXPERIENCE.
Before testing, write up a detailed plan as to why you think this testing should be done on nonproduction equipment. Express your concern that it's EXTREMELY UNWISE to test on production equipment, but that this is the only alternative if you wish to deploy a final working system.Send email far and wide regarding your concerns. CC yourself at your own personal email address.
In short, cover your hiney. If bozo the manager wants to take the risk, you MUST be able to provide ample documented evidence that he/she/it was warned.
SymbolNOBODY:
You said what's quoted below from you, here -> http://slashdot.org/comments.pl?sid=1476008&cid=30428430
"It's tolerated (perhaps encouraged) in part because these annoying actors are otherwised engaged in improving Linux. Major Debian and BSD contributors, for example, use slashdot as a workspace for their human-machine interaction side experiments, of which APK is probably one. In addition many of these trolls post links which, if you follow them, will completely hose a Windows machine. This is part of the game. - by symbolset (646467) on Monday December 14, @01:15AM (#30428430) Journal
I took offense to the BOLDED part... & ALL you EVER seem to have is "ad hominem" based attacks on people, not the points they make. So, my reply in the URL below was simple (and logical):
http://slashdot.org/comments.pl?sid=1476008&threshold=-1&commentsort=0&mode=thread&pid=30428430#30430244
Additionally, "symbolNOBODY"? Well - the day you can make something like this (& that got you PAID for it, & that has done as well for others online):
http://www.tcmagazine.com/forums/index.php?s=b861a743aa23c4568b7d73e07ef7ecec&showtopic=2662
That's also gone over 250.000 views worldwide in 1++ yrs.' time online, & across 15 forums where that guide for Windows Security has been made either an:
1.) "Sticky/Pinned" thread
2.) An "Essential Guide"
3.) Rates 5/5 stars (etc.)
AND, gets "feedback" like this from users that have applied it:
----
http://www.xtremepccentral.com/forums/showthread.php?t=28430
PERTINENT QUOTE/EXCERPT:
"...recently, months ago when you finally got this guide done, had authorization to try this on simple work station for kids. My client, who paid me an ungodly amount of money to do this, has been PROBLEM FREE FOR MONTHS! I haven't even had a follow up call which is unusual. Now I don't recommend this for the average joe, but it if can work for a kids PC it can work for anything! Now, i substituted OpenDNS and activated the Adult Content filter with them for this kids computer. I know its not perfect, but will catch over 99.5% of said sites."
and
http://www.xtremepccentral.com/forums/showthread.php?s=10f9ba9ad5ff990aaae1e7ec91f593a2&t=28430&page=3
"Its 2009 - still trouble free! I was told last week by a co worker who does active directory administration, and he said I was doing overkill. I told him yes, but I just eliminated the half life in windows that you usually get. He said good point. So from 2008 till 2009. No speed decreases, its been to a lan party, moved around in a move, and it still NEVER has had the OS reinstalled besides the fact I imaged the drive over in 2008. Great stuff! My client STILL Hasn't called me back in regards to that one machine to get it locked down for the kid. I am glad it worked and I am sure her wallet is appreciated too now that it works. Speaking of which, I need to call her to see if I can get some leads. APK - I will say it again, the guide is FANTASTIC! Its made my PC experience much easier. Sandboxing was great. Getting my host file updated, setting services to system service, rather than system local. (except AVG updater, needed system local)"
Thronka - forums member @ xtremepccentral.com
----
THEN, when you have done so, on THAT account? THEN, you can talk!
Also?
When you have done all of this as I have over time in this Art & S
Labs, yeah, good times! The biggest problem is keeping the labs both operational and relevant. I just finished cleaning out my company's network lab as the switchgear was not L3-capable, out of production and out of our network, and none of the interfaces were faster than 100Mbps. None of it could be updated to a relevant OS level. It is mentioned earlier that if you are a large enough network, you designate a branch to serve as a guinea pig for planned changes. Also, if you have a branch close down, make sure you reclaim the equipment if it is new enough and use that for your 'lab' until the next refresh. Sadly, using older equipment only works if you never plan to use leading (bleeding?) edge features. However, my colleagues and I have found that using older equipment sometimes masks new and unknown interactions between the new services and older, perceived-stable, protocols.
Plan ahead meticulously - using paper and pen is not a sin as it is often faster than trying to model your system in software. Also, leverage your vendors heavily. They have the latest gear, and hopefully you will have service contracts, and they can assist you in planning out major changes.
Praying when a change goes in is good, too.
I think, therefore I am - Rene Descartes; I yam what I yam, an' that's what I yam - Popeye
Not a cure-all by any means, but one more trick for the toolbox. Very useful during a maintenance window. Obviously Cisco specific.
(tftp/scp/etc new-config to router)
router# reload in 2
router# copy flash://new-config run
(something along those lines, this is off the top of my head, basically copy your new config to the running config)
if it works, wr it to startup config, if you get disconnected, wait 2 minutes for the router to reboot and automatically load the previous startup-config. Adjust the time as necessary depending on change/complexity.
Also use something like RANCID or KiwiCatTools to help handle managing your configuration changes.
But the best trick of all is using a full blown router emulator like gns3.
It's a MIPS emulator that loads unmodified IOS images. You can build complex scenarios and even attach them to NICs on the host PC. I've built labs with several routers attached to bridged NICs in VMWare guests. So you can literally start, say, a webserver on one vmware guest and access it across your gns3 "network". You can also bridge it to physical NICs -- you could have a 7206vxr router running on an old PC!
Plenty of limitations. Namely it can only run a specific set of IOS images for specific models and you have to use an NM-16ESW to simulate switching since switching is done in ASICs.
It's highly distressing to encounter these people, but many, tech and manager alike, actually think there's nothing wrong with working on production systems. To them that's just how it's done. They know no other way. Trying to educate them is met with blank stares and sometimes even harsh resistance.
For Cisco equipment you can get the Dynamips emulator. You must provide the Cisco IOS - which must be licensed from Cisco for your use. You can then emulate pretty much the whole range of Cisco switches/routers on your PC. It's pretty good for a small test lab, but I'm not sure for a full production test lab
Calculate CPU cycles, read the source code, understand your changes, and roll them out. Oh wait! You're using Cisco, nevermind.
seriously, buy a new router to replace a 'broken' one from a location and then somehow fix the broken one for your lab/office.
The truth is that sometimes you not only lack the equipment for lab testing, but also the real world usage scenario. I am often stuck in a situation where I must backup a config and then experiment with production equipment and so am forced to do this outside of business hours. I usually get a chance to do some functional testing offline but cant really put new systems through there paces very well in a lab.
The real key to success here? know what you are doing. You may have to work in less that ideal circumstances but you must be knowledgeable enough to fix a mistake in a reasonable amount of time.
Also consider getting your hands on a rig to do some virtualization. You can virtualize routers and server with something like Xenserver, vmware, or virtualbox. I have done an entire mock deployment of a cisco firewall + windows server 2008 r2 system for remote client access(Windows) and site-to-site vpns(cisco) on a single Xenserver because I can virtualize the cisco router (its slow), windows servers, and even create seperate networks to simulate seperate switches, sites, network segments etc. Q6600+8GB can be had for less than a grand at dell in whitebox.
What I'd suggest is something quiet alien to the tekhead. Get management on your side. Explain the issues talk about the problems. Give them easy to read bullet points. Management will then ask you "Well what do you suggest?" Well you know a lab that effectively mirrors the live environment is about as likely as rocking horse poo but ask about it anyway. If you have concerns they won't fork out the money for it then it's most likely a case that they won't but ask for it and make sure they understand and you discuss it... Assuming you didn't get a lab then talk about the change. Talk them over the mitigation you want to plan in, talk about the rollback, get them on board. Then hit them with a compromise. You know the network better than anyone; work out what equipment you do need to replicate the vast majority of the network. If 90% of your network is based upon say 3 standard models of switches/routers ask for a lab of them. Discuss that you can reduce the risk. Risk is factor you are looking at trying to reduce. You should be able to speak to you management saying. Option 1 cost $50000 99% of network tested Option 2 cost $10000 95% of the network tested Option 3 cost $5000 90% of the network tested The important thing is by getting them in on the dialog and the issue you face the risk assessment and responsibility is being shared between you and management. If things still go south you have some defence against people yelling at you, in fact management will understand the lengths you have gone to to reduce the risk & they will understand that you cannot promise 0% risk on the budget they want and they will have agreed to this...
I have a cunning plan...