Testing Network Changes When No Test Labs Exist?

Pretty simple, really by Anonymous Coward · 2009-12-24 11:17 · Score: 1, Insightful

Whenever you're working in/on a production environment, only one rule matters:

Don't fuck it up.

Re:Pretty simple, really by symbolset · 2009-12-24 11:53 · Score: 5, Funny

Oh, no. We do this all the time. Around the holidays we rewire the production server racks so their ethernet cables droop over the aisles, so we can hang up Christmas cards. Jimmy has a script that blinks the blue UID lights for a festive holiday display.

--
Help stamp out iliturcy.
Re:Pretty simple, really by MeatBag+PussRocket · 2009-12-24 12:04 · Score: 1

damn straight, thats why you get paid.
in theory, theory and practice are the same, in practice its not. you're job is to make it that way.
replace theory with lab and you see the fundamental flaw with the false sense of security a lab provides.

--
i wage a holy war against the apostrophe.
Re:Pretty simple, really by Anonymous Coward · 2009-12-24 14:10 · Score: 0

If you haven't fucked something up in production, I don't want you on the team fixing my network when something DOES accidentally go wrong in production.
Re:Pretty simple, really by 19thNervousBreakdown · 2009-12-24 14:29 · Score: 1

Holy blast from the past.

--
<xml><I><am><so><damn>Web 2.0</damn></so></am></I></xml>
Re:Pretty simple, really by bertoelcon · 2009-12-24 15:19 · Score: 0, Offtopic

By the way, black people don't like the word "nigger".
And yet they are allowed to use it themselves? Hypocrisy much?

--
Anything can be found funny, from a certain point of view.
Re:Pretty simple, really by Ozric · 2009-12-24 15:39 · Score: 1

A very wise Network Admin once told me.
Wisdom comes from experience. Experience comes from mistakes.
So, you see most of us wise administrators got our experience somewhere else, if you know what I mean.
wink wink, nudge, nudge.
Re:Pretty simple, really by Bruha · 2009-12-24 15:40 · Score: 1

It's called a FOA first office application. You do what modeling you can, check what you're changing and Rule #1 is dont fuck with something if you know nothing about it. We do it in the middle of the night and if it screws up things we just restore the changed equipment to the pre change state. Networks are too complex and even the best lab modeling does not catch all situations.
Re:Pretty simple, really by Marxist+Hacker+42 · 2009-12-24 15:42 · Score: 1

Either that, or redundancy, redundancy, redundancy. I always at least try to convince the bosses that hardware needs to be ordered in even numbers- so that we have onsite emergency replacements.
That extra hardware can then be used to build test beds.

--
SJW: a person who perceives an injustice, and while correcting it, commits a greater injustice.
Re:Pretty simple, really by X0563511 · 2009-12-24 16:58 · Score: 1

You mean the IPMI LEDs? (Dell Poweredge servers have a dual-color LED that can flash (blue or orange) that signifies errors etc. Accessible via IPMI (along with all sorts of other goodies, like serial over Ethernet at a level higher than the OS))

--
For large sets, this will be our guide even unto death, for the LORD will work for each type of data it is applied to...
Re:Pretty simple, really by poopdeville · 2009-12-24 17:13 · Score: 0, Offtopic

Everybody is "allowed" to use it. Black people who go around calling other black people "niggers" are seen as scum by every black person I have ever met.

--
After all, I am strangely colored.
Re:Pretty simple, really by i.r.id10t · 2009-12-24 17:31 · Score: 2

Oh, a variation on blinkenlights?
ACHTUNG!
ALLES TURISTEN UND NONTEKNISCHEN LOOKENPEEPERS!
DAS KOMPUTERMASCHINE IST NICHT FÜR DER GEFINGERPOKEN UND MITTENGRABEN! ODERWISE IST EASY TO SCHNAPPEN DER SPRINGENWERK, BLOWENFUSEN UND POPPENCORKEN MIT SPITZENSPARKSEN.
IST NICHT FÜR GEWERKEN BEI DUMMKOPFEN. DER RUBBERNECKEN SIGHTSEEREN KEEPEN DAS COTTONPICKEN HÄNDER IN DAS POCKETS MUSS.
ZO RELAXEN UND WATSCHEN DER BLINKENLICHTEN.
Some more stuff to not trip the lameness filter, I hope...

--
Don't blame me, I voted for Kodos
Re:Pretty simple, really by jhoegl · 2009-12-24 18:33 · Score: 1

Truth and win.
Re:Pretty simple, really by symbolset · 2009-12-24 19:19 · Score: 1

Well said. The same people wouldn't operate their car without AAA, OnStar and a fullsize spare tire think nothing of operating their datacenter without so much as a cold spare hard drive. It's unfortunate.

--
Help stamp out iliturcy.
Re:Pretty simple, really by Anonymous Coward · 2009-12-24 19:34 · Score: 0

We decomissioned our Dell servers to make room for the BSA (Beer Storage Array). The Peltier cooling is incredible and the BSA has better uptime stats.
Re:Pretty simple, really by lukas84 · 2009-12-24 20:32 · Score: 3, Insightful

Everyone has a test environment. But not everyone has a production environment.
Re:Pretty simple, really by malachai · 2009-12-25 03:33 · Score: 0

Marxist Hacker has the right idea.
When I pitched needing a lab, I was pretty quickly shot down since it was costs they couldn't see the benefit from.
Wait a month and pitch it again except this time from the aspect of downtime, redundant hardware diminishes this. Was an easy sell. And now, I have a second set of hardware I can test on.
Win.

The tag says it all by Lord+Byron+II · 2009-12-24 11:20 · Score: 4, Insightful

There are zero replies and the story is already tagged with "youreboned". That's the truth. If your higher ups won't front the money for proper test equipment and expect you to roll out production-ready equipment on the first go, then you really are boned. Of course, you can mitigate this by simple pen-and-paper analysis. What should each piece of equipment do? Are the products we've selected appropriate for the roles we're going to put them in? These sorts of questions can find a lot of bugs without any sort of testing. If you think, "what would I do if it was the 1980's?" then you'll be fine.

Re:The tag says it all by DigiShaman · 2009-12-24 11:36 · Score: 5, Insightful

Not all changes are a one-way trip. Having a rollback plan is also important. Should something very unexpected happen, be prepared to roll back any and all changes to undo what has just been done.

--
Life is not for the lazy.
Re:The tag says it all by Anonymous Coward · 2009-12-24 11:42 · Score: 2, Informative

Not Pushing Juniper gear, but their Commit functions in JUNOS, and commands like "rollback" are serious things to consider in these scenarios. JUNOS also does things like refusing to perform a commit if you've done something obviously stupid (it does basic checking of your config when you commit).
Label me a shill. Whatever. JUNOS is a lot better from an operator POV.
Re:The tag says it all by BiggerIsBetter · 2009-12-24 11:44 · Score: 4, Insightful

Not all changes are a one-way trip. Having a rollback plan is also important. Should something very unexpected happen, be prepared to roll back any and all changes to undo what has just been done.
Couldn't agree more, except to say, don't assume you'll be rolling back from a known state. I've seen roll-back plans that assume they're undoing the changes just put in, not reverting to the state before the changes. Yes, there's a difference between the two! Eg, if your install fails, maybe you can't un-install. Yes, this might mean additional resources and the overhead of FS and DB snapshots, and complete copies of config files, but better that than the alternative.

--
Forget thrust, drag, lift and weight. Airplanes fly because of money.
Re:The tag says it all by mysidia · 2009-12-24 12:20 · Score: 2, Informative

My personal favorite thing about JunOS is "commit confirmed 10"
This can be a lifesaver, if you fat fingered something, and you break even your ability to access to the device, your transaction should roll back in 10 minutes.
If nothing goes wrong, you have 9 minutes to do some simple sanity checks, make sure your LAN is still working, and then get back to your CLI session and confirm the change.
Re:The tag says it all by Anonymous Coward · 2009-12-24 12:27 · Score: 0

Rollback and commiting can be found in Cisco IOS XE, Cisco IOS NX-OS and Cisco IOS XR. So any high end platform that needs these functions...
Re:The tag says it all by afidel · 2009-12-24 12:36 · Score: 4, Insightful

This is networking equipment, other than transitory information like peer maps and MAC tables that can be re-learned you should always be able to revert to the previous state as far as the software and configuration.

My comments are that out of band management are the networking guys best friend, and POTS is the best OOB available. Also learn how to change the running config without affecting the saved config, that way worst case is you have to power cycle (can be done with the correct OOB config or you can pre-schedule a reboot that you cancel if everything goes well). Oh and downtime windows might seem like a luxury but unless you are Google or Amazon the business needs to be made aware that they are necessary and critical to the smooth functioning of their IT infrastructure, so you should be making these changes during the downtime window where everyone is aware that things might break.

--
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
Re:The tag says it all by karnal · 2009-12-24 12:41 · Score: 2, Informative

You bring up a good point regarding changing the running config vs the saved config.
What I'll do if I'm changing a remote system - POTS or no - is set up a reboot of the device in 15 minutes. After verifying the clock. Then, if something in the config causes an unforseen issue, you just need to wait a little for the switch/router to come back online with it's original config.
Obviously, this can extend the outage window - however, always plan for worst case...

--
Karnal
Re:The tag says it all by GaryOlson · 2009-12-24 13:06 · Score: 1

This is not 1985 anymore; rollback should be included by default in any networking equipment which deems to indicate it should be used for line of business networks. Providing rollback in only "high end platforms" is a scam for PHBs and less than competent network managers to waste money and feel good about themselves.

--
Every mans' island needs an ocean; choose your ocean carefully.
Re:The tag says it all by POTSandPANS · 2009-12-24 14:01 · Score: 2, Informative

On a cisco, you can just do "reload in 10" and "reload cancel". If you don't know about those commands, you really shouldn't be working on a production network unsupervised.
As for the original question: Either use similar low end equipment, or use your spares. (please say you keep spare parts around)
Re:The tag says it all by eggoeater · 2009-12-24 14:06 · Score: 3, Interesting

I'm a call-center telephony engineer. Kinda the same thing as network engineer in that you're routing calls instead of packets.
Back around '01, I was working for First Union (which later became Wachovia). They had this massive corporate push for anyone and everyone in IT to roll out a standardized Software Configuration Management, and of course we were included. The big problem was the lab. The corporate standard was to test changes in a lab environment and then move to production (duh).
For a telephony environment, we had a pretty good lab that could duplicate most of our production scenarios, but not all. Another problem was there were a LOT of people with their fingers in the lab since so many groups were involved: eg. The IVR team is in there because you have to have IVRs in the system. Same with call routing, call recording, desktop software, Q&A, etc.etc.
So the lab was in a constant state of flux with multiple products, multiple teams, and different software cycles and endless testing always occurring. We made it work by testing the stuff we weren't sure about in the lab, only doing changes in prod after hours, and having really good testing and back-out plans.
So when the corporate overlords started telling use we couldn't make any changes to production without running everything through the lab first, we basically laughed and told them we'd need around 500 million for the lab and dedicated resources to run it. I ended up telling them that to duplicate the production environment, we'd need another bank as our "test bank", and we could test changes on the test bank and then put them in the production bank.

As with so many things in that IT department, it went from being a priority to fading away when something else became a priority.

--
$7.95/mo, 200 GB disk, 2TBxfer, MySQL, PHP, RoR.
Re:The tag says it all by Anonymous Coward · 2009-12-24 14:08 · Score: 0

And yet again, stupidity reigns slashdot! I fail to see how you get modded +5 when the author clearly stated he's a network guy. Long story short, he doesn't care about DBs, your post is pointless! On a cisco router, simply avoid doing a
copy running-config startup-config
unless you have a backup of running-config before said changes.
DB snapshots have nothing to do with it!
Re:The tag says it all by afidel · 2009-12-24 14:27 · Score: 2

My favorite ultimate backup for rebooting a device is a DTMF controlled PDU, call into the OOB number and hit a magic number sequence and the device reboots =)

--
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
Re:The tag says it all by LordAzuzu · 2009-12-24 14:34 · Score: 1

Regarding running config and saved config, some time ago I did an iptables script that would test a new rule chain for a specified amount of time, then reverting back to the previous one. It has saved me a lot of time many times, and actually a couple of times I locked myself out of the machine (that was a remote one, obviously).
Re:The tag says it all by Anonymous Coward · 2009-12-24 14:39 · Score: 0

I agree, I am currently moving away from Cisco to JUNOS and I am loving it.
Re:The tag says it all by mysidia · 2009-12-24 15:20 · Score: 2, Interesting

"reload in 10" on a core router or switch (eg a massive switch that also has routing duties) is insane, and will probably impact the entire network, for 20-30 minutes, if you accidentally lock yourself out (but don't otherwise impact anything) and fail to cancel that reload.
In addition, reload is risky, and the equipment may fail to come back up correctly.
Sorry, it's not anywhere close to comparable to the configuration management features in JunOS.
"Reload in X" is a bad answer, and should never be done, except on equipment that doesn't matter that much, or at a time when an hour of downtime is completely expected and acceptable.
Re:The tag says it all by Stripe7 · 2009-12-24 15:22 · Score: 1

Pen and paper analysis may not find out all the issues. We had a weird one that flummoxed a bunch of network engineers. It was an IOS upgrade to the built in fiber bridge on a blade server. The old IOS worked fine, the new one worked until you tried to jumpstart a blade. Jumpstarts worked fine with the old IOS but not on the new one. As we rarely jumpstarted the blades, this issue was not caught until after the bridges on all the blade servers were upgraded.
Re:The tag says it all by Anonymous Coward · 2009-12-24 16:26 · Score: 0

Just make sure it works on your boss's system.
Mystery soved
Re:The tag says it all by POTSandPANS · 2009-12-24 17:35 · Score: 1

Reload in X is a fine answer. Obviously you wouldn't use it if the biggest issue is locking yourself out.
I'm not familiar with JunOS, but I'm thinking you could likely use cisco's kron to set up a similar action on a timer..
Re:The tag says it all by PitaBred · 2009-12-24 18:53 · Score: 1

Lemme guess... the priorities changed when a new manger was hired/one was let go, and someone decided they wanted to make a name for themselves?

--
My blog. Good stuff (when I remember to update it). Read it.
Re:The tag says it all by mysidia · 2009-12-24 19:07 · Score: 1

No it's not a "fine" answer. Whenever "Reload in X" will cause a more widespread issue than a misconfig, or reload can disrupt an even more important network/function served by your router, than the one you are making changes to.
You may be trading an outage or disruption of 10% of your LAN subnets due to misconfig, for a global Enterprise outage or 50% outage, including mission-critical subnets, due to reload of a key router (possibly mitigated by redundancy, but possibly not). Oh, and the auto-reload blew away the logs needed to review/help trace whatever was wrong too...
Remember your workstation can blue screen (for unrelated reason) just after you type "reload in X".
Just because it's all the vendor gives you, and the best you can do, does not mean it is a fine, safe, or good, answer. There's a difference between "having a good answer", and making do with an obviously deficient solution (such as reload in X)
"reload in X" works in a brute-force sledgehammer sort of way, and in some cases it might be acceptable, especially for edge devices in small branch offices, where the overall effect to the enterprise is miniscule, but it also has serious drawbacks.
The 10-20 minute delay for a router to reboot, alone, during which it is completely out, is a serious deficiency with the command. If some emergency is forcing you to make significant changes outside a maintenance window "commit confirmed 3" is 10000% safer on your router-that-handles-thousands-of-your-subnets than "reload in 3"
The difference is the severity of what happens should you not confirm or cancel the rollback/reboot in time.
Cisco's kron can't do what JunOS commit/rollback/snapshot does; frankly, kron, and most IOS configuration management options are very feeble. In fact, JunOS commit/rollback allows you to compare (diff), show you only the changes, and rollback to any prior version of the config. No configuration change is made until a 'commit' is done.
All changes occur at the commit, so if you make a series of changes to an existing firewall rule, you can re-check them as many times as you want after typing them, before placing all the changes into effect. This reduces the chance of error in the first place.
You can write pre-commit scripts to validate configs for common errors, before an attempt will even be made to apply them, as you would expect from proper configuration management systems,.
Only the specific services whose config is being changed normally see any disruption...
You may misconfig that disrupts a part of your network for a time. Selected auto-rollback, allows you to undo your change without taking down the entire router, without so much as bouncing other services, such as routing protocol adjacencies, RIP peers, OSPF adjs, BGP peers, etc, which would definitely bounce with "reload in X", and can cause longer term disruptions, including spanning tree hits, and such..
Re:The tag says it all by Anonymous Coward · 2009-12-24 20:31 · Score: 0

Exactly. You need an exit strategy should things fail.
I manage a system worth over 10 million dollars. Creating a lab out of it would cost almost the same.
You need a good test scenario to verify stuff, a rollback plan and a maintenance window.
If that is not possible, then a lab is. You must have either one.
Re:The tag says it all by Grail · 2009-12-25 00:30 · Score: 2, Informative

If you truly believe that a simple reversion of a configuration will cause a reversion to a previous state, you're sorely mistaken.
Once the device you're working on starts misbehaving, other devices around it will start misbehaving too. As an example, one change to a network I'm involved with was supposed to simply prioritise VoIP traffic for one customer. The change was successful, the engineer went home. Then three hours later a major network router failed, because the higher priority voice traffic which was now flowing over the router tripped some magic number of MACs that it could remember, at which point the card had to keep referring routing decisions back to the CPU.
The router's CPU became overloaded, other routes started dropping packets, and we ended up trying to resolve the problem by rebooting that router (because that's what was broken). The router on starting up was immediately overloaded and crashed again. Overall, it took about four hours to get to the problem resolved, which required reverting the VoIP change and turning off some customer networks to allow the core router to start up without the huge packet load. The customer networks were down for about three and a half hours.
In this instance, simply reverting the change to the VoIP services would not have resolved the problem. Once the camel's back was broken, removing a straw would not have fixed it.
Re:The tag says it all by Anonymous Coward · 2009-12-25 03:37 · Score: 0

Cisco supports rollback too:
http://www.cisco.com/en/US/docs/ios_xr_sw/iosxr_r3.3/getting_started/installation/guide/gs33init.html#wp1159138
Re:The tag says it all by YojimboJango · 2009-12-25 03:52 · Score: 1

Not to point out flaws in your plan, but most banks have disaster recovery center. Actually iirc it's a requirement for PCI compliance.
If you're responsible to have a separate HQ building ready to roll over to in the event of a disaster why are you not using that as a test bed. Added bonus, your disaster recovery center will almost always be more up to date than your main.
Re:The tag says it all by afidel · 2009-12-25 07:48 · Score: 1

And it's also a change that a lab would have been unlikely to predict, some of the time things break, life happens, and that's why you need dual paths and multiple datacenters to achieve more than about 4.5 9's in the real world.

--
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
Re:The tag says it all by eggoeater · 2009-12-25 12:54 · Score: 1

Disaster recovery centers are mostly for mainframes and other uber-mission critical functions. Most of our servers/software are running in multiple redundant data centers. When I left the bank, Wachovia had four major data centers.
But that's not the point. You would never use disaster recovery centers or redundant servers as a make-shift lab. If there was a problem on the production box and the redundant server wasn't available, you'd definitely be in deep doo doo.

My original point is if you have a production environment with 20 physical sites all connected a certain way, then you can't really FULLY test your changes in the lab unless your lab has the same number of sites all wired the same way, which is ridiculous. It would literally cost 50 mil.

--
$7.95/mo, 200 GB disk, 2TBxfer, MySQL, PHP, RoR.
Re:The tag says it all by bobp0303 · 2009-12-27 09:19 · Score: 1

Back in the 1980s we did everything on the live network at Cable & Wireless -- of course, the company no longer exists in the United States, but I'm sure there's no connection there. I remember chewing out the Vice President big-time -- didn't get fired, but didn't change anything either...
Re:The tag says it all by Anonymous Coward · 2010-01-03 09:32 · Score: 0

andy zebrowitz

Could be worse by 7213 · 2009-12-24 11:22 · Score: 4, Insightful

The best bet is to be ready to blame the vendor when things go south ;-)

Seriously, I'm right there with you. If management does not want to provide for a test lab & reasonable time to test. Then it's clear they've made a 'business decision' that the network is not of sufficient value / risk is not great enough for such investments.

This may change quickly once something goes south (assuming they understand why it did) but you're gonna be talking to a brick wall until then.

It could be worse, you could have management that are afraid of there own shadows & who freak out at the idea of replacing redundant components after a HW failure. (Ever had to get VP approval to replace a failed GBIC? Oh, I have & yes, I hate my life).

Re:Could be worse by mysidia · 2009-12-24 12:25 · Score: 2, Interesting

See how much approval you have to get when the network is down because of a failed GBIC.
Redundancies against component failure are very good for the enterprise, but also make it harder for engineers to do their job, since "nobody notices that something has gone wrong".
Perhaps the real redundancies should be reserved for the absolute most business-critical things.
Make sure less important things are non-redundanct and arranged in a way, so that if any link or GBIC does fail, something noticeable to management will stop working, and cannot be restored without fixing the broken thing.
Re:Could be worse by hazem · 2009-12-24 13:53 · Score: 2, Insightful

That reminds me of an article by Nelson Repenning, "Nobody ever gets credit for fixing problems that never happened". It's quite an interesting read... The guy who "saves the day" during an emergency always seems to get credit and reward, but what about the guy who keeps the emergency from ever happening?
Re:Could be worse by Sulphur · 2009-12-24 15:44 · Score: 1

One is Willie Mays, and the other is Joe DiMaggio.
Mays would make impossible catches, but DiMaggio was in the right place and the catch looked easy.
Re:Could be worse by mysidia · 2009-12-24 15:45 · Score: 1

I think it's taken for granted as an expected part of the job, that the minimum things engineers/architects are supposed to do is prevent emergencies from happening.
If a bad enough emergency does happen, they might get fired for 'not doing their job', but they'll rarely ever get commended when their design works and protects the enterprise against certain doom.
Except by other engineers... I think (to some extent), that's just life.
How's a non-technical person supposed to tell the difference between the network being stable because it was well designed, and the network being stable, because the thing that can bring it all down just hasn't ever happened to have had any issues yet?
You'd be surprised how long a network with crucial issues can appear on the surface to be just fine, only to one day have a catastrophe due to the poor design, years later, when least expected....
Only network engineers are really qualified to really give this type of credit.. whereas any bimbo off the street can see when someone fixed an emergency [even if their own mistake caused it -- from many people, you will not get an admission of guilt, by avoiding admitting it, they can make it appear they are cleaning up after someone or something else] :)
Re:Could be worse by The+Wild+Norseman · 2009-12-24 16:19 · Score: 1

One is Willie Mays, and the other is Joe DiMaggio. Mays would make impossible catches, but DiMaggio was in the right place and the catch looked easy.
Can someone please explain this sports analogy with a car analogy so I can understand it?

--
"A government is a body of people usually -- notably -- ungoverned." -Shepherd Book
Re:Could be worse by Nutria · 2009-12-24 17:17 · Score: 1

Can someone please explain this sports analogy with a car analogy so I can understand it?
Which is better, Toyota (who's cars never fail) or the local mechanic (who's a whiz at instantly diagnosing and fixing Chevys)?

--
"I don't know, therefore Aliens" Wafflebox1
Re:Could be worse by comrade+k · 2009-12-24 18:24 · Score: 1

Can someone please explain this sports analogy with a car analogy so I can understand it?

I think a pizza analogy would be more appropriate.

--
"Every vision is a joke until the first man accomplishes it; once realized, it becomes commonplace." -Robert H. Goddard
Re:Could be worse by jsailor · 2009-12-25 02:30 · Score: 1

I think the point is that Willie Mays and Joe DiMaggio would both get the job done, but do it in different ways. Willie Mays would make the catches while running, diving, over-his-head, etc. All of which looked amazing and made the highlight reels. DiMaggio, did a better job of preparing and positioned himself better so that when catching a similar hit, he would be in position to catch the ball standing up. Both were great athletes, but DiMaggio made the job look routine. Even though he was equally capable of making the fantastic plays, his approach was more safe because he didn't have to make them so often. Planning and design vs. fire fighting.
Re:Could be worse by Sulphur · 2009-12-25 02:46 · Score: 1

One is Willie Mays, and the other is Joe DiMaggio. Mays would make impossible catches, but DiMaggio was in the right place and the catch looked easy.
Compare Porsche and Ferrari : One may have a plexiglass panel to see the engine.
Compare DiGiorno and DeLivery : One is more stealthy.
Re:Could be worse by martyb · 2009-12-25 08:37 · Score: 1

That reminds me of an article by Nelson Repenning, "Nobody ever gets credit for fixing problems that never happened". It's quite an interesting read... The guy who "saves the day" during an emergency always seems to get credit and reward, but what about the guy who keeps the emergency from ever happening?
Hey! Thanks for that!!!!!
I'd heard variations on it several times but assumed it was just folklore or [un]conventional wisdom. Your post prompted me to search and find the article you mentioned: here's the abstract: Nobody Ever Gets Credit for Fixing Problems that Never Happened: Creating and Sustaining Process Improvement and here's a link to the full pdf document.
Re:Could be worse by winwar · 2009-12-25 09:17 · Score: 1

"Both were great athletes, but DiMaggio made the job look routine. Even though he was equally capable of making the fantastic plays, his approach was more safe because he didn't have to make them so often."
Based on this criteria, DiMaggio was a superior fielder. It would be better to have a DiMaggio running your infrastructure than a Mays. Other departments may have different criteria (say sales).
Excellent people make hard tasks look easy. Mediocre people make easy tasks look hard.
Re:Could be worse by Anonymous Coward · 2009-12-27 08:19 · Score: 0

Keep a log of number of outages and you should be able to show either a decrease in outages or a maintained low level of outages with a list of changes that happened.
Re:Could be worse by RMH101 · 2010-01-04 03:09 · Score: 1

Case in point: last year we had a server room outage here at a big retailer. UPS tripped, whole lot went down including 24/7 supply chain etc - millions lost per hour. Cue some phone calls to a few IT people who happened to be out on the beer that night who came in and eventually sorted it out after about 6 hours downtime. This was sold as a triumph of IT's dedication and professionalism - no one asked "why did the bloody DC only have a single UPS and single phase power?"

Virtualization? by bsDaemon · 2009-12-24 11:22 · Score: 4, Interesting

It's perhaps not the best solution, as a lot of problems I've faced since I started getting more into networking stuff than software configuration and web server administration have been related to bad cables rather than bad IOS settings, but virtualization can help you create test situations on the cheep. Specifically, GNS3 allows you to create test networks in a virtual environment, then import software images for your Cisco routers, switches, PIX firewalls, Juniper hardware, etc, all run on hypervisor technology.

You can also use QEMU to create virtual network nodes. If you have enough RAM, then this can help at least get the logical issues worked out and the software configurations square. Then you just need to do the real work :) I'm still pretty new to networking myself, and I use it to make little test labs for myself when I need to do more than I can with the two 3600 and the 2600-series routers I got to take home for experimenting with. I actually copied the IOS images off of them via TFTP and then can replicate them as many times as I need to, but I can claim I have whatever interfaces I need, plus it will (thankfully) simulate the ATM switch for me as well.

Re:Virtualization? by loki_ninboy · 2009-12-24 11:32 · Score: 2, Informative

I'm using the GNS3 software with some IOS stuff to help prepare for the the CCNA exam. Sure beats paying the money for the extra hardware laying around the house just for learning and testing purposes.
Re:Virtualization? by value_added · 2009-12-24 11:41 · Score: 4, Informative

Specifically, GNS3 allows you to create test networks in a virtual environment, then import software images for your Cisco routers, switches, PIX firewalls, Juniper hardware, etc, all run on hypervisor technology.
For anyone unfamiliar with GNS3, a link to the website. There are versions available for Windows, Linux, and OS X. FreeBSD already has it in ports.
As a side note, I'd add that maintaining a home lab (to the extent practicable and useful) is one way to side-step limitations of what your employer provides. Consider it a combination of "Ongoing Professional Education" and "Proactive Job Security Measures" (i.e., "I better test this shit to save my ass tomorrow").
Re:Virtualization? by afidel · 2009-12-24 12:39 · Score: 1

Almost as importantly with a simulator you don't have to POWER all that equipment, my CCNP lab almost maxed out a 20A circuit.

--
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
Re:Virtualization? by Bios_Hakr · 2009-12-24 14:43 · Score: 2, Informative

If you work a pure Cisco environment, talk to your Cisco guy about getting Packet Tracer. Emulates a few routers and a lot of switches. It works really well. Plus, 5.1 adds virtual networking. You can design several networks on several laptops and then join those networks over a virtual internet.

--
I'd rather you do it wrong, than for me to have to do it at all.

You could simulate the net with Packet Tracer by sconeu · 2009-12-24 11:25 · Score: 1

Granted, it's not really an ideal solution, but it may wind up being the only way to avoid using production equipment.

--
General Relativity: Space-time tells matter where to go; Matter tells space-time what shape to be.

Document and test at night by jdigriz · 2009-12-24 11:27 · Score: 5, Informative

Step 1) Make a formal request for the test lab. Make it as detailed as possible. Explain the impact to business if various components fail. Make a plain-language executive summary calling out risks. step 2) Once the request is denied, make sure you have a paper trail of the rejection step 3) If possible test network changes on the production equipment at 2am so that impact on users will be less step 4) Once the inevitable failure occurs, haul out the paper trail and get the bean counter fired. Repeat until test lab is approved. Note, step 4 may get you fired instead. Business decisions are somewhat nondeterministic.

Re:Document and test at night by Renraku · 2009-12-24 11:41 · Score: 1, Insightful

If you get fired for failing to do a job for which you were not equipped (and they know you aren't equipped for it), you might be able to sue because they created a hostile work environment. Hostile work environment lawsuits aren't just for sexual harassment, folks.

--
Job? I don't have time to get a job! Who will sit around and bitch about being broke and unemployed then?
Re:Document and test at night by nametaken · 2009-12-24 11:45 · Score: 1

There's a potential hitch or two in your plan.
If it goes smoothly anyway, you might look like a whiner that didn't need the expensive toys to keep on the shelf. They feel vindicated. If it goes poorly they'll assume you didn't really try because you wanted to prove yourself right.
Re:Document and test at night by Keruo · 2009-12-24 11:47 · Score: 3, Informative

step 3) If possible test network changes on the production equipment at 2am so that impact on users will be less
Been there, done that. Sadly the only way to see how your setup works is to try it in production.
Sure it helps if you can test it beforehand, but sometimes your lab might not reflect what happens in real network when you roll something out.
Just make sure you can clock those am hours as overtime/nighttime work.
And remember to backup the running config twice so you can restore the production network if something goes fubar.

--
There are no atheists when recovering from tape backup.
Re:Document and test at night by SethJohnson · 2009-12-24 11:55 · Score: 4, Funny

If it goes smoothly anyway, you might look like a whiner that didn't need the expensive toys to keep on the shelf.
Hence, you have the plug to the main router beneath your own desk. When the sailing looks smooth, you kick out the cord. While everyone freaks out, you open up a terminal window and begin typing nonsensical commands. Say, "Ahaaah! As you re-plug in the router.

Job security.

Seth

--
$5 / month hosted VPS on linux = awesome!
Re:Document and test at night by dkf · 2009-12-24 12:31 · Score: 1

Been there, done that. Sadly the only way to see how your setup works is to try it in production.
The other thing to mention is to be honest with the other technical staff about if you've actually made a change, even if "trivial". This is because sometimes when you modify something, you can end up dumping them in the shitter accidentally, e.g., by putting a critical service on the wrong side of an internal firewall so that no packets get routed to it at all. In fact, I saw that once and networking stonewalled for a week before admitting that indeed they had made a small modification "that shouldn't have affected anything" and which cost quote a lot of lost work. By being honest and humble, they'll cut you more slack later if/when the boot's on the other foot.
OTOH, if management are insisting that all communications get routed through them, you're screwed. (NB: that's not the same as managers getting Cc'ed.)

--
"Little does he know, but there is no 'I' in 'Idiot'!"
Re:Document and test at night by jdigriz · 2009-12-24 12:32 · Score: 1

Are we in the same business? No one ever notices IT when things go well.
Re:Document and test at night by FrankDerKte · 2009-12-24 12:45 · Score: 1

Sounds like Bastard Operator From Hell to me. But it could be the only defense against Incredibly Incompetent Manager From Hell.
Re:Document and test at night by dbIII · 2009-12-24 13:07 · Score: 1

Sure it helps if you can test it beforehand, but sometimes your lab might not reflect what happens in real network when you roll something out.
That means your experimental model is not good and needs to be refined.
You see - all those guys that did a six month course and call themselves "engineers" could have had some benefit of a real engineering education or the experience of working with real engineers.
Meanwhile I have idiots learning about routing or DHCP on production systems because they can't be bothered to go into another room and turn on power switches to run things on a development network. It's very nice when developers work at improving their skillset, but the attitude of trying things to see if it works instead of RTFM is not compatible with anyone else using things at the same time. We've been poisoned by the MSDOS mindset of the single user, poor documentation and the "just reboot and see if it works now" attitude.
Re:Document and test at night by GaryOlson · 2009-12-24 13:10 · Score: 1

Please tell us all how you convinced an electrician to install dual L6-30 208V plugs beneath your desk. And how kicking said twist lock plugs -- both of them -- will cause the plug to come loose.

--
Every mans' island needs an ocean; choose your ocean carefully.
Re:Document and test at night by maxume · 2009-12-24 13:11 · Score: 1

I use fire.

--
Nerd rage is the funniest rage.
Re:Document and test at night by Anonymous Coward · 2009-12-24 13:35 · Score: 3, Interesting

Note, step 4 may get you fired instead. Business decisions are somewhat nondeterministic.
And that's what happened to me.
I was forced into making changes in the production environment, and caused an outage that affected 2 people. Once I realized what happened, I quickly fixed it; however due to internal politics I was terminated the next day.
Initially I was in shock. 10 years, 2 months employed in a single company. Gone. I have a stay-at-home wife and 3 kids; which made things look even bleaker.
In hindsight, it may be one of the better things to happen to me. I had spoken with a recruiter a few days before hand to start looking for work. When this happened, I was able to dedicate myself full time for job-searching. I was also off for hunting season, and able to do many things with my family that I normally wouldn't be able to do. The environment where I was was just awful. Several former co-workers have left since my special day. The CTO is a psychopath. He has 2 sayings he likes to use - the first is 'to do the job right the 1st time'. The second is a Mario Andretti quote of 'If you don't feel like you are out of control, then you aren't going fast enough'. These sayings are mutually exclusive, but logic doesn't apply.
I start a new position on Jan 5th (but it is only a 6 month contract position). It is a bit more money, and I have about 1/2 the commute. It is also a much better work environment.
Things I learned:
- Stockholm syndrome is apparently real. I didn't want to leave because 'it's not that bad'. It was bad. Worse.
- I hate job hunting.
- Employment law in Ontario, Canada is not what I thought it was. Pretty much everything I though I knew was wrong.
- The economy here in Ontario is poor, but improving (but vastly better than the US).
- Legal advise in Ontario is tax deductible (at least in reference to employment issues).
- A certain CTO is a complete and total prick.
(ha - my captcha word is 'inaction')
Re:Document and test at night by SharpFang · 2009-12-24 13:52 · Score: 2, Interesting

3) If possible test network changes on the production equipment at 2am so that impact on users will be less step
That's dangerous. You leave it apparently running and crawl back to sleep at 4:30AM, to get an angry call at 7:05AM when the first users to log in report something essential is fucked up.
Prepare and test at 2AM, then roll back to original. Then re-apply around lunch break and wait with your fingers on roll-back for the first reports of failure.

--
45 5F E1 04 22 CA 29 C4 93 3F 95 05 2B 79 2A B2
Re:Document and test at night by timmarhy · 2009-12-24 13:54 · Score: 1

yep, and the other fail here is that a lot of production environments are 24/7. there is NOT a slow point, ever.

--
If you mod me down, I will become more powerful than you can imagine....
Re:Document and test at night by BiggerIsBetter · 2009-12-24 13:55 · Score: 1

3) If possible test network changes on the production equipment at 2am so that impact on users will be less step
You're a network guy, right? How well do you know the applications that use your network? How sure are you that the application behind, or in front of the change you're making don't need a restart after losing connectivity? Maybe your late night tests are causing all sorts of problems and expense when the apps guys come in to find the system inexplicably down, having visible outages, and have to start raising support requests against vendors to find a solution to their non-reproducible high severity defect in production? Don't do that.

--
Forget thrust, drag, lift and weight. Airplanes fly because of money.
Re:Document and test at night by mybecq · 2009-12-24 14:13 · Score: 1

Say, "Ahaaah! As you re-plug in the router."
With your feet? You ARE talented!
Re:Document and test at night by Anonymous Coward · 2009-12-24 14:49 · Score: 0

Man you really got screwed. I am sorry to hear that and I hope you find a great job soon.
Re:Document and test at night by Anonymous Coward · 2009-12-24 15:17 · Score: 0

You're the man, man!
Re:Document and test at night by jdigriz · 2009-12-24 15:33 · Score: 1

Actually no, I'm an applications and server guy. I had to learn networks because our network guys were incompetent. However, if your applications get to such a bad state that they need to be restarted due to a loss of network connectivity, they're badly written. And if the applications guys don't know about the sensitivity of their apps to network outages, and aren't actively monitoring their servers for interrupted services, then they don't know their applications very well either. In any case, if Step 3 causes the problems you mentioned, then we go on to Step 4 and the problem is solved, one way or another.
Re:Document and test at night by mysidia · 2009-12-24 15:51 · Score: 1

Until one afternoon when the janitor unplugs something from a power strip under your desk, to get an outlet for their vacuum, and the main router happens to go down....
Re:Document and test at night by mysidia · 2009-12-24 16:06 · Score: 1

Not power, the Ethernet feed :)
Router's LAN Interface -> Patch Panel -> Ethernet Port Under your desk -> Straight cable plugged into other port under your desk -> Patch Panel -> Firewall/Security Appliance Outside LAN Interface.
So by kicking out that cable, you separate LAN from router...
Or, by kicking out power to the 5-port hub under your desk you temporarily used to sniff traffic between the firewall and your site's edge router.
Re:Document and test at night by Anonymous Coward · 2009-12-24 16:18 · Score: 0

Step 3a: pull some tiles from the machine room floor, disconnect the lights and call the bean counter in for an "urgent meeting". Be sure to have plenty of quicklime handy...
Re:Document and test at night by mysidia · 2009-12-24 16:26 · Score: 1

In some environments, that is frustrated by other (lazy) technical staff, who immediately start automatically blaming _every_ problem they find for the next few weeks, on that one change, without even doing any helpful troubleshooting, or finding any reason at all to suggest it might be the case.
The problem is unrelated and would happen anyways, but because they heard of a recent change, there is a cognitive bias towards immediately suspecting the new change, just because it's a change they know about.
"I didn't change anything, so if I just started getting a few problem reports it must be your change"
This is the sort of thing that may annoy some technical workers, and possibly cause them to not report certain minor changes as widely as they could. Desktop support should not care much, for example, if the network team changes security measures on routers protecting administration access, or performs regular password changes, there are lots of minor changes that don't merit announcing.
It's trouble enough that technical staff (esp. Desktop admin types) often seem to automatically think perfectly innocent network devices, routers, firewalls, switches, need to rebooted, before exhausting obvious causes like software/Windows problems.
"Someone was getting '504 page not found' errors trying to reach some web site.. so i'm power cycling the router labelled "Catalyst 6509-E core switch" in the wire closet, to see if it helps.. (You're doing what??)"
Re:Document and test at night by tlhIngan · 2009-12-24 17:49 · Score: 2, Informative

In some environments, that is frustrated by other (lazy) technical staff, who immediately start automatically blaming _every_ problem they find for the next few weeks, on that one change, without even doing any helpful troubleshooting, or finding any reason at all to suggest it might be the case.
The problem is unrelated and would happen anyways, but because they heard of a recent change, there is a cognitive bias towards immediately suspecting the new change, just because it's a change they know about.
"I didn't change anything, so if I just started getting a few problem reports it must be your change"

Which is why you announce the change will happen on X, but actually wait a week or two before actually committing the change. Then any bellyaching that happens, you can file as their problem. If any real issues happen, you can even hold off doing the change in case your change might aggravate the problem.
It's the same when new cell towers or other equipment are installed - people will complain of headaches and other crap caused by the tower right after it's "turned on", when in reality, it's been running months beforehand, or hasn't even been turned on yet.
Re:Document and test at night by BiggerIsBetter · 2009-12-24 18:15 · Score: 1

Interesting analysis. I've observed just such a scenario, and the difficultly lies in product issues (yes, the product behaves this way... until the next upgrade) and the layer between the apps guys and the app (the support guys). In an ideal world I'd agree with you, but we're operating in a business world, so I'll modify my statement to say, don't mess with the network after-hours if it's not your apps being affected.

--
Forget thrust, drag, lift and weight. Airplanes fly because of money.
Re:Document and test at night by itwerx · 2009-12-24 19:01 · Score: 1

On the consulting side people do notice when things are running well because they aren't getting billed!
Re:Document and test at night by mcrbids · 2009-12-24 20:02 · Score: 1

4) Once the inevitable failure occurs, haul out the paper trail and get the bean counter fired. Repeat until test lab is approved. Note, step 4 may get you fired instead. Business decisions are somewhat nondeterministic.
And this is the part that SUCKS.... A while back, I was part of a three-way integration project, with myself (representing a vendor), another vendor, and the ultimate customer. In advance, I'd talked through everything with the other vendor so we had a clear plan, including a verification step to cross-check the accuracy of the integration.
Confident, I went into the meeting, we presented our plan, the customer agreed, and all was going well, until when, in what I thought was going to be a closing step, I reiterated the entire plan from beginning to end.
The other vendor decided that this was a good time to disagree with the plan that had already been agreed to, and that doing the cross-check was not valuable. I came out strongly, indicating that the system could not be trusted unless the cross check was built. After going back and forth for a while, the customer decided they didn't want to pay for it.
So the plan went forward, and I was predicting that it wouldn't work, and why it wouldn't work. And when it didn't work, I was accused by the customer AND THE OTHER VENDOR of trying not to make it work! The situation went from bad to worse, to worse still.
Finally, I ended up building the cross-check, on my dime. I was PISSED. Even after the cross-check was built, and the problems started to show clearly rather than just be vague, I never got anything more than a nod.
You can believe that my further dealings with the client and the third party were markedly more reserved after that. Being the "hero" who has the answer in a flash, and saves the day is over-rated. Far better to hint that the problem is difficult to solve and take your time answering the question!

--
I have no problem with your religion until you decide it's reason to deprive others of the truth.
Re:Document and test at night by vaniderstine · 2009-12-25 17:23 · Score: 1

Your statement is mostly true. Unfortunately, management will remove your budget if it goes too well. And production engineers will only comment on your networks' stability after it starts going to hell again.

--
I "AM" ring-0.
Re:Document and test at night by Anonymous Coward · 2009-12-25 17:51 · Score: 0

Did it hurt getting that stick up your ass?

My last resort by tchdab1 · 2009-12-24 11:27 · Score: 5, Funny

I call my buddies at RIM and test my mods on their system.

Re:My last resort by sparkin_nz · 2009-12-24 14:36 · Score: 1

Is that some kind of RIM job?
Re:My last resort by itwerx · 2009-12-24 19:07 · Score: 1

RIM-shot!
Re:My last resort by JustOK · 2009-12-24 21:21 · Score: 1

You can tell if you see a bunch of little black berries

--
rewriting history since 2109

Boson and VMware by Zlurg · 2009-12-24 11:28 · Score: 0

Seriously, try and find as much virtual equipment as you can and replicate it as closely as possible to your production lab. If you run one of the myriad sniffers on a VM, you might even come up with a clever way to send production traffic to your virtual lab. There is no other way to do it. You are screwed, so if you're serious, you can either buy the lab yourself or make one out of tin cans, coconuts and wet rope.

Re: Testing Network Changes When No Test Labs Exi by droz037 · 2009-12-24 11:35 · Score: 2, Interesting

I would suggest asking your vendors for demo or evaluation equipment. Cisco, Juniper and 3Com have pools of demo equipment as do the resellers like PC Connection and CDW.

I've done deployments of new switching infrastructure based on work I've done with loaners from my vendors. It can be tough because the typical evaluation period is 30 days. Although you can get 45 and even 60 days.

If you have a good relationship with your sales rep. It would be easy to push them to get the necessary items to do basic testing and get the concepts down of how you need to deploy. Then get the config files so that when you do buy what you need you're 85% of the way there.

Let it burn by Anonymous Coward · 2009-12-24 11:38 · Score: 0

One problem with the situation you are in, is that you've got a work-around that has sufficed so far. So, you might WANT a test lab, but clearly you don't NEED one... because hey, if you needed it you couldn't have got all this production stuff working, right? The only way this changes is when you've got multiple teams dealing with a production outage that takes a long time and costs a lot of money because you have to do some trial-and-error fixes to isolate the problem. Only THEN will you get your test lab, after an appropriate amount of paperwork and delay. The trick is doing this without the outage being perceived as your fault.

Packet Life by z4ns4stu · 2009-12-24 11:40 · Score: 3, Informative

Stretch, over at Packet Life has a great lab set up that anyone who needs to test Cisco configurations on can sign up for and use.

--
The whole moon and the entire sky are reflected in one dewdrop on the grass. - Dogen

Tools by Tancred · 2009-12-24 11:43 · Score: 5, Informative

Here are a few tools:

GNS3 - http://www.gns3.net/ - free network simulator, based on Dynamips Cisco emulator
Opnet - http://www.opnet.com/ - detailed planning of networks, from scratch
Traffic Explorer - http://packetdesign.com/ - plan changes to an existing network

Re:Tools by Anonymous Coward · 2009-12-24 12:55 · Score: 0

If you need to virtualize LOTS of nodes IMUNES (http://imunes.tel.fer.hr/virtnet/) might be interesting. It uses network stack virtualization now integrated in FreeBSD 8 kernel and can run simulations with thousands of nodes with cca 100kb overhead per node for virtualization.
Good side: IPSEC, Link BER and bandwith, lightning fast
Bad side: no CISCO emulation, you get only basic switch/hub elements,
Old screenshot: http://old.tel.fer.hr/imunes/GUI-normal.gif
Re:Tools by vvaduva · 2009-12-24 13:19 · Score: 1

Those are great recommendations, thanks!
Re:Tools by Anonymous Coward · 2009-12-24 21:39 · Score: 0

Yeah, Opnet... $35,000 for the base software without any modules. Can I pay for that in Zimbabwe dollars? Or should I just wait for the cracked torrent version?

lots of little things by wintermute000 · 2009-12-24 11:45 · Score: 2, Informative

Older Cisco equipment can function just as well as newer for 95% of lab scenarios. You are very unlikely to be needing to use all the newer features.

Anything that can run IOS 12.3 and is newer than a decade old can do a lot more than you think. We do all our BGP testing on a stack of 2600s and 3600s and never an issue even though in production its 2800s, 3800s etc.
Granted there are features that you do need the newer kit esp when syntax changes (e.g. IP SLA commands, newer netflow commands, class map based QoS to name three off the top of my head) but none of the core routing and switching features/commands has changed much since the introduction of CEF - they all do ACLs, route maps, OSPF, BGP, EIGRP, vlans, spanning tree, rapid spanning tree, IPSEC vpns. I'm speaking from an enterprise POV not a service provider but I'd imagine if you are in a telco environment you wouldn't be lacking gear.

For many minor test scenarios, you can pick a test branch office and use the good old 'reload in XYZ' command to ensure that no matter how badly you stuff it up, everything will bounce and come back (just remember NOT TO COPY RUN START lol).

Then there's the sleight of hand methods:
- always ordering more for projects than you really need. Par for the course really esp as most project managers haven't a clue when it comes to the nuts and bolts of a big cisco order.
- pushing for EOL replacements as early as possible, intentionally conflate end of sale with end of life.
- getting stuff in for projects as early as possible, then you have a month or two to use it as test gear.
- remember that your lab need not mirror reality, scale down as much as possible. e.g. to simulate a pair of 4506 multilayer switch running in VRRP, use a pair of 3560s. Use your CCO login and flash away to your hearts content (I know its breaching licencing but for test scenarios, meh).

Re:lots of little things by shooteur · 2009-12-25 05:10 · Score: 1

Good points. QoS can be a bitch to lab on Cisco gear, with the amount of variations between platform types, and IOS code. Most labs in my experience are usually made up of older equipment, that differs from production, that don't support the QoS features on the production gear (ie c3550 in lab, vs c3560/c3750/c6500 in production). Getting the PHBs to agree upgrading lab gear, can be harder than getting approval for a lab in the first place.

Rancid? by fuzzyfuzzyfungus · 2009-12-24 11:48 · Score: 1

It doesn't save you from doing stupid things; but putting your device configurations under revision control, using something like Rancid can make rolling things back easier, as well as generally encouraging sanity around device configuration.

Re:Rancid? by SlamMan · 2009-12-24 15:30 · Score: 1

Rancid's good. Also look at CatTools from http://www.kiwisyslog.com/kiwi-cattools-overview/ for a similar windows tool. Free for small networks, ~$550 for networks over 20 devices.

--
Mod point free since 2001

Go virtual! by leegaard · 2009-12-24 11:58 · Score: 3, Informative

If you are unable to recycle old equipment into your testlab you should go virtual.

For Cisco routers, GSN3/Dynamips (www.gns3.net) is your friend. Any recent PC or laptop will allow you to build a large and complex topology that will satisfy most experiments and even support you when doing certification preparation. It will only work for routers so switch-based platforms are out (like the 3570,6500 and 7600). The good news is that the features are more or less the same and they more or less behave the same way. If "more or less" is not close enough you need a replica of your production network or at least a few devices of each to test what can be labelled as critical.

For Juniper routers, google juniper Olive. It will run a juniper router the same way dynamips runs a Cisco router.

In both cases a proactive partnership deal with the vendor will be a good idea. Both Cisco and Juniper (and I am sure all other major network vendors) have programs where they will more or less advise, test and prepare the configurations for you. If you run a critical network this is money well spent.

In the end it comes down to the level of risk your management is willing to take. Ask them if they will allow the network to be less up since you are unable to properly test your changes before implementation.

Out of hours changes, and change managment by anti-NAT · 2009-12-24 11:58 · Score: 2, Informative

For any sort of medium to large network, you can't fully simulate it. That means you're always going to be making "untested" environment. So, you make very few changes rather than lots, you make sure after each change they've had the desired effect, and you have backout plans.

--
The Internet's nature is peer to peer - 20050301_cs_profs.pdf

Re:Out of hours changes, and change managment by ewertz · 2009-12-24 18:10 · Score: 0

We had a name for change mangagement in a previous position I had -- Change Prevention. The problem is that when certain things are very fragile *and* core to your operations, and totally not understood by those that are making the process decisions, change management is an impediment on the road to looming potential semi-disaster. I simply had an agreement with my manager. I could make changes in production, but if things ever went belly up, he'd have to instantly fire me. This worked for the better part of my last year until I decided that the whole thing just wasn't the right place for me. Test, test, test until you can't stand it any longer, know what it takes to pull it back out, push it out on Sunday night and sleep under your desk until about a half hour before the market opens. Don't bother having any personal stuff at work in case it's your last day. Easy!

if you ask for it, you may not get it... by Anonymous Coward · 2009-12-24 11:59 · Score: 0

but if you write a proposal and show the benefits of having the right equipment and the operational costs of not having the right equipment, you might be able to get a spirent testcenter. Do a demo with some linux/*bsd boxes running iperf, but remind them of the features and abilities you will get with quality network testing tools.

Borrow a lab! by jimpop · 2009-12-24 12:02 · Score: 3, Interesting

Cisco have many (large) labs located around the world. Sign up for some time in one of them.

Plan, inform and be prepared! by Anonymous Coward · 2009-12-24 12:04 · Score: 1, Insightful

Been there, done that (A LOT!!)
But it has failed quite a few times too..

If no money available for test labs, make good plans... Tell the dudes that wanted the changes (or if you are the dude that wants the changes inform the correct people that you will be doing stuff) Agree on a service window. Have backup plans.. Have all configurations saved.. Let all users know that after 10pm on that saturday network will be down for 10 mins etc etc..

Have tons of contengency plans, and let the 'responsible' people known what you are about to do.. Plan everything 'wide'... So even a 5 mins cable plugover, reserve a service window outside of office hours for 2 hours..

The power of logic compels me! by Anonymous Coward · 2009-12-24 12:05 · Score: 1, Interesting

You do not mention that this has ever made shit hit a fan. I conclude that so far this has not occured.

Consequently, you have proved that you are able to work without expensive test equipment by a combination and motivation and elbow grease. Congratulations!

Now, what is the logic for someone with a finite pool of money to provide equipment for someone who obviously does not NEED it? Yes, None At All!

You can therefore:
1) Wait until shit hits a fan and say "well, that's what happens when we don't have test equipment". You will then get test equipment OR get fired.
2) Make the shit hit the fan yourself. This is quite difficult to do inconspicuously, so you'll probably get fired and a shit reference.
3) Look around for jobs as well paid as yours but with test equipment.
4) Someone mentioned asking vendors for test equipment - maybe that might work? Note: sales reps have a quota of favours they can call in, so it helps if you have some steady business with them.

Re:The power of logic compels me! by Anonymous Coward · 2009-12-24 13:26 · Score: 0

5) STFU, and make it work the first time.
Posting humorously, for obvious reasons.

Simulation by Anonymous Coward · 2009-12-24 12:11 · Score: 0

That is what simulation/network planning software is for. For example OPNET: http://www.opnet.com

In case it explodes by Ximok · 2009-12-24 12:17 · Score: 0

reload in 5

I'm dead serious. If you are on production stuff and you screw it up remotely, you can at least tell it to reload and pull it's old config. You have some downtime, but it's better than the downtime you'd experience if you had to drive out there.

Paper Trail by tengu1sd · 2009-12-24 12:24 · Score: 3, Interesting

>>>refuse to provide funds for expensive lab equipment, test circuits and for reasonable time to get testing done before moving equipment or configs into production.

Make sure that every change request implementation documents that this change is being placed intro the production environment for testing. Document impact ranging from total network failure to moderate inconvenience and include roll out time tables. The roll out needs include travel times such drive to site B or fly cross country.

Of course the downside of this is that management may go out and hire someone who knows, or at least pretends to know, how to drop changes into place without whining about ignorance and making customers uncomfortable.

Some pointers by pehrs · 2009-12-24 12:24 · Score: 1

It depends a lot on your environment and the complexity you are dealing with. Test labs are wonderful things, but typically you end up in a situation where your network is so limited that a lab won't help much, or your network simply too complex to create a sane lab environment without dedicated staff and a huge budget.

Building a full scale lab is a large undertaking. It takes time and effort. You will need taps (for routing information), traffic generators, topology management and more. In my experience it's usually better to have a smaller testbed that is used to test large changes before deployment and design your network so it's resilient to configuration mistakes.

Getting funding for a limited testbed is also much easier than a full lab, and you can do a lot of testing by simply stuffing a few routers in a rack and connecting it to the network management system. Virtualization is something a lot of people will mention. It's useful, but it's hard to build anything resembling a modern network on top of it. You want hardware that resembles what you use in the network. Sometimes you can scavenge such hardware during upgrades, which can provide you with a basic testbed to build from.

Don't waste the next crisis/flap by Anonymous Coward · 2009-12-24 12:32 · Score: 0

When it happens, point out (on paper!) to yr mgmt chain how it cd have been prevented with a decent test configuration in place.

No, there's nothing wrong with that by Rix · 2009-12-24 12:34 · Score: 1

As long as the downtime that will result is acceptable.

Don't forget SOX by jackb_guppy · 2009-12-24 13:04 · Score: 2, Informative

1) You should not be making any direct changes to the network with out correct design, test and sign off.

2) You should already have a redundant network structure, so "half" can be loss without any loss to network operations. This way the change can be tested in parallel.

3) You should always report to SOX officer when a request outside correct operations and management is made. It makes it their responsibility to solve the legal issues, for not following their written standards, before you began.

Re:Don't forget SOX by butlerm · 2009-12-24 18:12 · Score: 1

Not that this isn't a good idea for other reasons, but how exactly does this requirement flow from the Sarbanes-Oxley Act? I mean, the whole thing is about financial controls, accuracy, disclosure, and reporting. I suppose your network could impair the timeliness of some reports, but it is hard to see in general how it is going to affect their accuracy.
Re:Don't forget SOX by jackb_guppy · 2009-12-25 02:08 · Score: 1

Access Control - using VLANs and "firewalls" can be broken with errors
Business Continuity - system is unavailable for *any* work
Damage - system needing to be reload to return to working status w/ loss of intermediate work.
Notification - Is there some thing that maybe wrong
Missed Posting - due to todays separated servers one being "down" could cause *lost* postings and revenue.
That is just a short list. SOX opens a whole array issues, Best to let the SOX team to worry.
The network is now the system!
Re:Don't forget SOX by linuxrocks123 · 2009-12-25 05:11 · Score: 1

If what you know about Sarbanes-Oxley comes from Slashdot, you probably think it is significantly wider in scope than it actually is. You're correct: Sarbanes-Oxley is only tangentially related to IT, but somehow Slashdot thinks that it's this big huge-ass problem. See here for more information: http://en.wikipedia.org/wiki/Information_technology_controls#IT_controls_and_the_Sarbanes-Oxley_Act_.28SOX.29
Specifically this:
"The 2007 SOX guidance from the PCAOB[1] and SEC[2] state that IT controls should only be part of the SOX 404 assessment to the extent that specific financial risks are addressed, which significantly reduces the scope of IT controls required in the assessment."
---linuxrocks123

--
vi ~/.emacs # I'm probably going to Hell for this.

its the time to by GooseYArd · 2009-12-24 13:05 · Score: 1

polish your resume.

Download vyatta by Sxooter · 2009-12-24 13:06 · Score: 1

Download an iso from Vyatta and build a test network with old PCs and spare NICs for testing. Sure, it's not the exact same as Cisco, but if they're too cheap to buy the real thing for a test lab then you'll at least be somewhat close.

Then, once you realize what you're not getting for your money with Cisco, you can buyt $1000 1U servers and build your own routers (or buy them prebuilt from Vyatta for about $2000) to replace the ciscos and make a profit selling the used Ciscos on ebay.

I do NOT work for nor am I affiliated with Vyatta. But their gear is pretty impressive, and open source.

--

--- It is not the things we do which we regret the most, but the things which we don't do.

Re:Download vyatta by ewertz · 2009-12-24 17:52 · Score: 0

Don't see this being useful unless you're running Vyatta in production. Hope you don't do this for a living!
Re:Download vyatta by itwerx · 2009-12-24 19:19 · Score: 2, Funny

Tired of the VI vs EMACS war? Try the new Vyatta vs pfSense conflict instead! :) (pfSense is great...)
Re:Download vyatta by Anonymous Coward · 2009-12-25 03:42 · Score: 0

this is what i do. Of course i use vyatta on my production routers too.
Re:Download vyatta by Sxooter · 2009-12-25 13:57 · Score: 1

I do, and I do it well. You'd be surprised how much setting things up on linux / vyatta is to a cisco router. BGP is BGP. Of course, if you're not smart enough to use the knowledge gained from testing on the free stuff, I understand. Not everyone is.

--

--- It is not the things we do which we regret the most, but the things which we don't do.
Re:Download vyatta by Sxooter · 2009-12-25 14:01 · Score: 1

Either is fine by me, as long as it keeps me from having to use Cisco.

--

--- It is not the things we do which we regret the most, but the things which we don't do.
Re:Download vyatta by ewertz · 2009-12-26 11:43 · Score: 0

"I do, and I do it well." Numerous studies have been done that demonstrate that people consistently overvalue their contribution/abilities. But that's not on-topic, so enough of that. I will absolutely grant you that there are a many, many cases where someone's ends may be served by plopping down unrepresentative software onto unrepresentative hardware, call it test network, don't care about the difference, and may not be any the worse for it. However, there *are* a large class of problems and are in *no* way served by this. It's not a matter of not appreciating "free stuff", as I certainly do. But a test network that shares nothing in common with my production network isn't work squat. It's probably that we're just working at substantially different tolerances. regards
Re:Download vyatta by Anonymous Coward · 2009-12-29 06:18 · Score: 0

Replacing Cisco routers with cheap 1U servers is an awesome plan because networks of course only consist of ethernet interfaces.

UNH-IOL by slugmass · 2009-12-24 13:10 · Score: 1

The UNH-IOL is a neutral, third-party laboratory dedicated to testing data networking technologies through industry collaboration.

http://www.iol.unh.edu/

Get it in writing, let it fail. by timmarhy · 2009-12-24 13:10 · Score: 1

Make your objections in writing, email it to the manager demanding the change you believe to place production at risk with the risks clearly outlined in bullet points. if he then insists you proceed, make him send you the request in writing/email and print out a duplicate, keep it in a safe place and then make his change. This way he owns the failure, not you. paper trails exist for a reason, to cover arses, and arse covering is often a worthwhile exercise.

--
If you mod me down, I will become more powerful than you can imagine....

You answered your own question. by mnslinky · 2009-12-24 14:32 · Score: 1

As you already said, we secretly test on production in such cases.

Only a matter of time by w00ten · 2009-12-24 14:57 · Score: 0

It's only a matter of time until a change that wasn't properly tested completely screws everything up and some exec is lookin at you for answers. I've learned that the best interpersonal skill to have is deflection. Nice guys finish last, especially in a corporate environment, so try to get test equipment and when they say no, like all companies do, SAVE IT so you can blame someone else! This is what you can send to the CTO when he asks why you didn't properly test the changes that caused the company to lose millions of dollars in operating costs cause the network was down for 6 hours. "well, I warned people in this email trail and proposal, but they shot me down, and I was right". If by some incredible miracle this never has to happen, then count your lucky stars and when they ask why nothing has gone wrong, toot your own horn and say that it's because you are so damn good. No matter what, you show value, you secure your position. As for basic testing, any of the programs mentioned here will work, Packet Tracer is limited in the models it supports so you might want to look at something else first.

try clownix by peril · 2009-12-24 15:09 · Score: 1

http://www.clownix.net

I did a write-up on this product in the beginning of this month - can run quagga routers in the UML image of your choice - wrote / ran a 12 router lab that ran on a p4 with 512MB / RAM. (http://www.vlcg.net/content/cloonix-clownix-rocks)

If this product was used - you would only be able to functionally test the protocols in a particular topology - wouldn't be cisco, and it wouldn't be the same as production (different protocols, different topologies).

I discovered this trying to figure out a way to run quagga in a gns3-like setup. GNS3 is great for testing a specific cisco thing that you need to learn about - but it didn't do well for me beyond 3 routers - (too much hand-holding getting the environment tweaked).

My ultimate vision for quagga would be to run it on the hypervisor and let it scale (in numbers of routing instances) wrt to the number of hypervisors - it's a pipe dream for now, but I think that routing that can scale with hypervisors is going to be a big challenge for cisco (esp if they try to do it in silicon) -

--Adrian

Don't do it by tedgyz · 2009-12-24 15:52 · Score: 1

Management hates paying for double the equipment, but for any production environment, it should be the cost of doing business. It minimizes risk and provides hot spares faster than an HP (or whatever) tech shows up. You should get some duplicate hardware for staging.

If you can't do that, then refer to the earlier post - don't fsck up.

--
"No matter where you go, there you are." -- Buckaroo Banzai

Hope you're good and/or have a good relationship. by Anonymous Coward · 2009-12-24 17:06 · Score: 0

I work in a small IT house that provides network support for quite a few customers that are not large enough to have their own IT people.

We're very Windows centric (yeah, I know, boo) and have no budget for any test equipment/training, yet am expected to be up to date on changes in Windows.To make matters worse, I'm not even supposed to have the time to test things out on our internal network and the pay is low enough that I can't afford to purchase equipment to test at home on my own time.

So, what works (kinda) for me has been to keep an eye out for equipment that has been abandoned by our sales team, (usually through extensive hardware problems that causes a customer to decide that it's not reliable enough for their network), and/or take equipment off of the sales shelf for testing. For the software/knowledge side, I will quite belligerently tell my boss to go away when I'm testing something that needs to get tested. This requires that you have a certain amount of clout and/or your boss is afraid of you quitting enough to let you get away with it.

On the customer/end user side, develop some sort of personal relationship with them, whether that be going out for drinks with them periodically, knowing what they do for fun and/or have them know what you do for fun (no, gaming doesn't cut it). Be up front with them when something does mess up (literally saying that you didn't realize that what you were doing might have that problem).

Never, ever blame someone else unless you're sure it's their fault, take the blame yourself-this'll save your ass when it really isn't your fault and someone tries to pin it on you.

---
Having said all of that, what you (and I) should really be doing is looking for a new job.

Cisco Router by blavallee · 2009-12-24 17:24 · Score: 1

router# wr me
router# reload in 30
router# conf t
router(config)# (good luck)
.. disconnected from remote host (oops, wait for reload)

Re:Hope you're good and/or have a good relationshi by poopdeville · 2009-12-24 17:49 · Score: 1

To make matters worse, I'm not even supposed to have the time to test things out on our internal network and the pay is low enough that I can't afford to purchase equipment to test at home on my own time. ...

Honestly, you are better off with a smaller salary if you would spend a raise on the company. The opportunity costs of such an idea are just absurd.

--
After all, I am strangely colored.

Comment removed by account_deleted · 2009-12-24 18:19 · Score: 2, Insightful

Comment removed based on user account deletion

Don't just blame the management... by MindPrison · 2009-12-24 19:48 · Score: 1

...You're as guilty - if not MUCH more - than they are here....

Quoting you: "Management often expects us to get a job done but refuse to provide funds for expensive lab equipment"

Well, have you considered it might be that you may not have informed the management from the start what's to be expected in the future? If there is ONE THING that the management does well and knows better than most of us - is how to EARN and KEEP money, they trust YOU to do your job and know everything about it so it doesn't have to be a future headache for them. If you FAILED to INFORM them of your possible needs in the beginning, you have yourself to blame buddy.

You're not alone though, I've been there myself...trying to convince my bosses why I need all that extra gear to keep it safe in the future - when everything has worked FINE so far.

So - be prepared - rather than complaining later.

--
What this world is coming to - is for you and me to decide.

Try Junipers by Anonymous Coward · 2009-12-24 20:09 · Score: 0

Juniper routers have much more powerful management interface than Cisco. They have built in configuration versioning, atomic changesets, allow easy rollback and can schedule commits. Also there is programmatic configuration API. If you have no test lab, Junipers can help you a lot.

Changes during bussiness hours by Anonymous Coward · 2009-12-24 20:26 · Score: 0

We actually try to put non-downtime incurring changes though during business hours, this way any unexpected issues will come up immediately and we can react. Rather than This is in a seizable high end production environment.

Reuse DR network equipment by Anonymous Coward · 2009-12-24 23:49 · Score: 0

Our company purchases DR equipment for our network hardware. Thus if a switch in the data center blows out we can replace it very quickly.

Instead of leaving it on the shelf we hook it up and use it for test environments. If production needs a replacement we drop the test environment and put the DR in place.

Costs more upfront but makes good use out of the equipment.

Services by Alarash · 2009-12-25 00:45 · Score: 1

You hire Professional Services from a lab/test equipment manufacturer (Spirent, Ixia, BPS) or dedicated testing companies (EANTC or others). Most of them will accept to work during the night, so you need to get a "maintenance" window where they can inject traffic. I do that all the time, from the testers side. It's stupid to do, by the way, because you should always test *before* production.

But that's really dangerous and the best way is still to test in the "lab". A lab can be a temporary rack where you put test equipment you rent for a few days. Those test equipments can emulate very complex network topologies, so even if you have only, say, one firewall you need to test, you don't need the rest of the network devices in your lab (although it would be better, of course, but it's not mandatory). Most of the companies have at least one spare unit for their network equipments, to quickly replace them if they were to fail, so you could use that one for testing a new configuration before committing it to production. Again, not ideal, but definitively better than not testing. A nice blog to read about the importance of testing is Spirent's.

Whose Your Daddy? by duanes1967 · 2009-12-25 03:10 · Score: 1

I have worked for a few companies that had limited labs, but none that had a comprehensive lab. They would operate in staged upgrades and used emulators as a sanity check, plus a peer review by at least two other engineers. Make sure that there is a management VLan in operation and just shift vc's as needed. A wholesale re-engineering is just asking for it. The key to the whole thing is, ensure you have remote (dialup) access to the routers in question, never save the changes until you are happy, and make sure you keep a good copy on flash in the router. It comes down to your awesome Ninja router skills. This is where a $100K network guy makes his money versus a $35K graduate. EXPERIENCE.

The horrible, but necessary answer by Anonymous Coward · 2009-12-25 03:18 · Score: 0

Before testing, write up a detailed plan as to why you think this testing should be done on nonproduction equipment. Express your concern that it's EXTREMELY UNWISE to test on production equipment, but that this is the only alternative if you wish to deploy a final working system.Send email far and wide regarding your concerns. CC yourself at your own personal email address.

In short, cover your hiney. If bozo the manager wants to take the risk, you MUST be able to provide ample documented evidence that he/she/it was warned.

This is even simpler by Anonymous Coward · 2009-12-25 04:11 · Score: 0

SymbolNOBODY:

You said what's quoted below from you, here -> http://slashdot.org/comments.pl?sid=1476008&cid=30428430

"It's tolerated (perhaps encouraged) in part because these annoying actors are otherwised engaged in improving Linux. Major Debian and BSD contributors, for example, use slashdot as a workspace for their human-machine interaction side experiments, of which APK is probably one. In addition many of these trolls post links which, if you follow them, will completely hose a Windows machine. This is part of the game. - by symbolset (646467) on Monday December 14, @01:15AM (#30428430) Journal

I took offense to the BOLDED part... & ALL you EVER seem to have is "ad hominem" based attacks on people, not the points they make. So, my reply in the URL below was simple (and logical):

http://slashdot.org/comments.pl?sid=1476008&threshold=-1&commentsort=0&mode=thread&pid=30428430#30430244

Additionally, "symbolNOBODY"? Well - the day you can make something like this (& that got you PAID for it, & that has done as well for others online):

http://www.tcmagazine.com/forums/index.php?s=b861a743aa23c4568b7d73e07ef7ecec&showtopic=2662

That's also gone over 250.000 views worldwide in 1++ yrs.' time online, & across 15 forums where that guide for Windows Security has been made either an:

1.) "Sticky/Pinned" thread
2.) An "Essential Guide"
3.) Rates 5/5 stars (etc.)

AND, gets "feedback" like this from users that have applied it:

----

http://www.xtremepccentral.com/forums/showthread.php?t=28430

PERTINENT QUOTE/EXCERPT:

"...recently, months ago when you finally got this guide done, had authorization to try this on simple work station for kids. My client, who paid me an ungodly amount of money to do this, has been PROBLEM FREE FOR MONTHS! I haven't even had a follow up call which is unusual. Now I don't recommend this for the average joe, but it if can work for a kids PC it can work for anything! Now, i substituted OpenDNS and activated the Adult Content filter with them for this kids computer. I know its not perfect, but will catch over 99.5% of said sites."

and

http://www.xtremepccentral.com/forums/showthread.php?s=10f9ba9ad5ff990aaae1e7ec91f593a2&t=28430&page=3

"Its 2009 - still trouble free! I was told last week by a co worker who does active directory administration, and he said I was doing overkill. I told him yes, but I just eliminated the half life in windows that you usually get. He said good point. So from 2008 till 2009. No speed decreases, its been to a lan party, moved around in a move, and it still NEVER has had the OS reinstalled besides the fact I imaged the drive over in 2008. Great stuff! My client STILL Hasn't called me back in regards to that one machine to get it locked down for the kid. I am glad it worked and I am sure her wallet is appreciated too now that it works. Speaking of which, I need to call her to see if I can get some leads. APK - I will say it again, the guide is FANTASTIC! Its made my PC experience much easier. Sandboxing was great. Getting my host file updated, setting services to system service, rather than system local. (except AVG updater, needed system local)"

Thronka - forums member @ xtremepccentral.com

----

THEN, when you have done so, on THAT account? THEN, you can talk!

Also?

When you have done all of this as I have over time in this Art & S

Labs are not the whole answer by axafg00b · 2009-12-25 04:23 · Score: 1

Labs, yeah, good times! The biggest problem is keeping the labs both operational and relevant. I just finished cleaning out my company's network lab as the switchgear was not L3-capable, out of production and out of our network, and none of the interfaces were faster than 100Mbps. None of it could be updated to a relevant OS level. It is mentioned earlier that if you are a large enough network, you designate a branch to serve as a guinea pig for planned changes. Also, if you have a branch close down, make sure you reclaim the equipment if it is new enough and use that for your 'lab' until the next refresh. Sadly, using older equipment only works if you never plan to use leading (bleeding?) edge features. However, my colleagues and I have found that using older equipment sometimes masks new and unknown interactions between the new services and older, perceived-stable, protocols.

Plan ahead meticulously - using paper and pen is not a sin as it is often faster than trying to model your system in software. Also, leverage your vendors heavily. They have the latest gear, and hopefully you will have service contracts, and they can assist you in planning out major changes.

Praying when a change goes in is good, too.

--
I think, therefore I am - Rene Descartes; I yam what I yam, an' that's what I yam - Popeye

A simple trick that might help in some instances by jon3k · 2009-12-25 05:32 · Score: 1

Not a cure-all by any means, but one more trick for the toolbox. Very useful during a maintenance window. Obviously Cisco specific.

(tftp/scp/etc new-config to router)

router# reload in 2
router# copy flash://new-config run

(something along those lines, this is off the top of my head, basically copy your new config to the running config)

if it works, wr it to startup config, if you get disconnected, wait 2 minutes for the router to reboot and automatically load the previous startup-config. Adjust the time as necessary depending on change/complexity.

Also use something like RANCID or KiwiCatTools to help handle managing your configuration changes.

But the best trick of all is using a full blown router emulator like gns3.

It's a MIPS emulator that loads unmodified IOS images. You can build complex scenarios and even attach them to NICs on the host PC. I've built labs with several routers attached to bridged NICs in VMWare guests. So you can literally start, say, a webserver on one vmware guest and access it across your gns3 "network". You can also bridge it to physical NICs -- you could have a 7206vxr router running on an old PC!

Plenty of limitations. Namely it can only run a specific set of IOS images for specific models and you have to use an NM-16ESW to simulate switching since switching is done in ASICs.

Many think this is normal by CyberLife · 2009-12-25 07:12 · Score: 1

It's highly distressing to encounter these people, but many, tech and manager alike, actually think there's nothing wrong with working on production systems. To them that's just how it's done. They know no other way. Trying to educate them is met with blank stares and sometimes even harsh resistance.

Dynamips by klashn · 2009-12-25 08:38 · Score: 0

For Cisco equipment you can get the Dynamips emulator. You must provide the Cisco IOS - which must be licensed from Cisco for your use. You can then emulate pretty much the whole range of Cisco switches/routers on your PC. It's pretty good for a small test lab, but I'm not sure for a full production test lab

Do it the olde fashioned way by Anonymous Coward · 2009-12-26 02:17 · Score: 0

Calculate CPU cycles, read the source code, understand your changes, and roll them out. Oh wait! You're using Cisco, nevermind.

cheat, lie and steal! by itzdandy · 2009-12-27 07:09 · Score: 1

seriously, buy a new router to replace a 'broken' one from a location and then somehow fix the broken one for your lab/office.

The truth is that sometimes you not only lack the equipment for lab testing, but also the real world usage scenario. I am often stuck in a situation where I must backup a config and then experiment with production equipment and so am forced to do this outside of business hours. I usually get a chance to do some functional testing offline but cant really put new systems through there paces very well in a lab.

The real key to success here? know what you are doing. You may have to work in less that ideal circumstances but you must be knowledgeable enough to fix a mistake in a reasonable amount of time.

Also consider getting your hands on a rig to do some virtualization. You can virtualize routers and server with something like Xenserver, vmware, or virtualbox. I have done an entire mock deployment of a cisco firewall + windows server 2008 r2 system for remote client access(Windows) and site-to-site vpns(cisco) on a single Xenserver because I can virtualize the cisco router (its slow), windows servers, and even create seperate networks to simulate seperate switches, sites, network segments etc. Q6600+8GB can be had for less than a grand at dell in whitebox.

Risk is the factor you are wanting to reduce by Private+Baldrick · 2009-12-28 23:51 · Score: 1

What I'd suggest is something quiet alien to the tekhead. Get management on your side. Explain the issues talk about the problems. Give them easy to read bullet points. Management will then ask you "Well what do you suggest?" Well you know a lab that effectively mirrors the live environment is about as likely as rocking horse poo but ask about it anyway. If you have concerns they won't fork out the money for it then it's most likely a case that they won't but ask for it and make sure they understand and you discuss it... Assuming you didn't get a lab then talk about the change. Talk them over the mitigation you want to plan in, talk about the rollback, get them on board. Then hit them with a compromise. You know the network better than anyone; work out what equipment you do need to replicate the vast majority of the network. If 90% of your network is based upon say 3 standard models of switches/routers ask for a lab of them. Discuss that you can reduce the risk. Risk is factor you are looking at trying to reduce. You should be able to speak to you management saying. Option 1 cost $50000 99% of network tested Option 2 cost $10000 95% of the network tested Option 3 cost $5000 90% of the network tested The important thing is by getting them in on the dialog and the issue you face the risk assessment and responsibility is being shared between you and management. If things still go south you have some defence against people yelling at you, in fact management will understand the lengths you have gone to to reduce the risk & they will understand that you cannot promise 0% risk on the budget they want and they will have agreed to this...

--
I have a cunning plan...

Slashdot Mirror

Testing Network Changes When No Test Labs Exist?

164 comments