Redundant Internet Access?
Supp0rtLinux asks: "In order to meet uptime requirements and SLAs, we decided to get redundant T1's with BGP. We already had two Cisco 7200 routers and a T1. After the ISP turned up the additional circuit and we tested everything on our end, all seemed fine. But when the CO lost power and the generator failed, we had no access for 16+ hours. This prompted some investigations which revealed that yes, we did in fact have a redundant T1 with BGP setup and local redundant routers with separate UPS... on our side. However, on their side both our feeds were plugged into the *same* switch which was on the same PDU which happened to be in the same CO and was on the same sonet. And they were charging us for redundancy! Six month later, we have a truly redundant BGP setup. Each feed goes to separate CO's with the primary to the local one. This makes for separate physical switches, separate power, and we have confirmed we're on physically separate sonets. Now, the only true single point of failure is the physical cabling in the street, but in CA that doesn't get damaged very often. To those of you on Slashdot who know what I'm talking about: are your circuits truly redundant? What have your experiences in network redundancy been? How have you gotten past the sales guy to a tech that knows what redundancy really means? Have you been able to prove your redundancy? Have you found yourself paying for something that you weren't really getting?"
I haven't put the "on" to our redundancy just yet, but I can assure you one thing. When I do, two different companies will be providing the circuits.
Having them in two COs, redundant everything, yet linked to the same AS(when it isn't mine) makes me nervous.
Boss: We need redundant connectivity and power.
Sales-Goof: You can have as many people open browsers on as many computers as you want.
For comparison and not a plug, when my boss asked the IBM guy, he pulled out charts and wiring diagrams to explain what they had.
US Democracy:The best person for the job (among These pre-selected choices...)
To those of you on Slashdot who know what I'm talking about: are your circuits truly redundant? What have your experiences in network redundancy been?
I have two homing pigeons.
If Cupid smiles on them, soon I'll have even more redundancy.
Opinions on the Twiddler2 hand-held keyboard?
I worked at a place that was running redundant T1's just as you describe. They might as well have had all the wires running together the whole way.
My issues from there:
1. How do you convince an ISP to bring a feed in from another CO? Distance is a huge problem--they don't want to run it.
2. How do you know what the ISP has on their end, UPS's, generators, etc? Should that be part of the SLA? Or should you demand a tour of their facilities to see where your wire goes?
3. How can you coordinate two seperate ISP's for automatic redundancy? I suppose with a LinkProof box or something. And how do you know they aren't coming through the same telco CO?
4. Should you pay to have them manage the lines and router configurations in a 24/7 scenario? Or does it work well enough to have them do the initial install and then let it run?
5. Finally, what's a resonable cost for this redundancy?
I have some more projects that will be requiring this type of setup. Am interested to hear any opinions and recomendations from experience from fellow slashdotters......
Thanks much!
-m
http://www.invisik.com
Having "redundant" circuits to the same provider is pretty useless. You really need to be connected to two completely separate upstream providers for decent redundancy. If you have mission-critical needs, you want 3.
-Randy
"Have you found yourself paying for something that you weren't really getting?"
Aw! You're making this too easy.
Great, now I will mod you up +1 Redundant.
Now where is it? Aha...there...
Part of the expense was justified by cost savings using VOIP between the stations and the operations centers.
This is a boring sig
My personal opinion is that trying to reach this level of redundancy for a lot of companies is just not practical and that there are much better approaches.
... or AT&T ... or MCI ... or Sprint ... or Cogent)".
The idea here is to think of your internet connectivity as two different classes of services. You should place your internet reachable servers in a good co-lo. Get BGP lines from two different sources and multi-home the boxes. Don't run your own AS (use the upstreams space) but instead place your servers "close" to your provider's edge routers. In the end, you are BPGing the loop and it is hard for 100ft of cat-5 to fail. In the end, you have to ask yourself "Am I more qualified to keep my BPG up than is Level-3 (or Savvis
In terms of your office, stick to client-only type services. Get two "diverse" connections. This might be a T-1 and a DSL, or a DSL and a cable modem. By using completely different architechures, you can get incredible diversity without spending a bunch of money. You can then IPSEC your local net over the client-only connection back to your addresses in the co-lo and with the help of a little client-side monitoring, auto-switch when a line goes down.
We offer something similar as a part of our hosting offering for users with green-screen (telnet, serial terminal) applications. A client gateway application manages logical "connections" back to our multi-homed central servers walking around BPG router "flaps" and other transient outages that BGP does not even address.
Redundant should be designed as 100% redundant - count on your ISP or local telco getting bombed off the face of the planet - then plan around that.
If you don't have a CLEC or ISP - then turn to DSS sat.
For those of you who have a T1 and want a cheap backup - think about ISDN, DSL, or even a Cable internet account - it doesn't have to equal a T1 but would do in a pinch for routing mail and basic traffic.
If your boss doesn't think your company needs a redundant line - go unplug the csu/dsu for an hour and then ask again.
When ordering DS1s from a telco, you generally have to specify diverse routing to get them nailed to a different CO!
-psy
Even though you have redudant circuts, that doesn't .. a few weeks ago my company suffered about 8 hours of downtime because of a MCI fuckup/russian company advertising routes for about ~100 of ATT's customers. Our systems were up the whole time but a good deal of the internet was trying to route traffic for those networks through russia.
..
mean you have redudant internet access. BGP can still fail.
That is
MCI/worldcom says it happened because a fiber was cut in ohio, which exposed a weakpoint, eventually after much escalation MCI added acls to block those routes from being advertised accross their network.
During the downtime depending on where you connected from you had up to 99% packet loss. ATT claims fully redudant OC-somethings comming into the data center(we colo with their ATT ENS branch), all it took was one little fucker on the net to advertise routes to screw us(and others) over.
not being a network guru I am not certain how the average company could defend themselves against such a problem. seems many router peering setups operate on the grounds that they trust each other to do the right thing
of course ATT doesn't have to say anything to the public, I think the government regulations say that the telecom companies must go public with info if it impacts more than 15,000 of their customers(forgot where I read that). In this case they claimed it affected less than 100. Perhaps only those located at the same datacenter as my company's stuff, a datacenter which appears to be more than 60% vacant.
So, just a second...
Both your T-1's go to the same ISP? Why are you running BGP then? You aren't gaining anything from this except for added complexity. If you're going to continue with this setup, drop the BGP and bond the T-1's together.
The only reason you would want to run BGP is if you had separate links to different ISPs. This is the best way to do it when going for added redundancy. Then if one ISP has a problem, your routes only get propagated out the other link. Keep in mind that you will probably have to play around with as-path prepending and some other things to balance your traffic properly when you do this. And keep in mind that if your total bandwidth exceeds that of one T-1, when one of them does fail, you are going to saturate the other one. If you make sure you get enough bandwidth to prevent this from happening, you won't need to play around with balancing the traffic so much either.
There are a couple of companies out there that make BGP load balancing devices that will look at the load on each of your links, and make modifications accordingly. I've never used one, and have no idea how well they work. F5 I think makes one, and there was another I looked at awhile back that cost $8k or so, but I forgot the name of it now.
But, bottom line, BGP over 2 links to the same ISP is pointless unless you have a separate path to another ISP somewhere.
Need Free Juniper/NetScreen Support? JuniperForum
We had four T1s -- two from MFS and two from Bell. Of the four T1s, two (one MFS and one Bell) went to one NSP in Santa Clara, and the other two went to a different vendor in Oakland.
We even had physical plant diversity -- the Bell loops came from cable that ran along Stevens Creek Blvd, and the MFS fiber came up from the street that ran behind us. Outside of the building burning down, we were bulletproof.
Ran three years without a single minute of downtime.
My crowning glory in network design. Never again did I work for an employer who was willing to put their money where there mouth was for reliability.
How does the Slashdot Effect happen given that no slashdotters ever RTFA?
Now, the only true single point of failure is the physical cabling in the street, but in CA that doesn't get damaged very often.
Let me get this straight... you're complaining that a PDU went down at your provider yet you're perfectly happy that you're running both circuits over the same cable under the street? In California?
Cables are cut all the time. Stupid things like rain water seeping through insulation take down entire city blocks. A single earthquake can disable hundreds of square miles for weeks or months.
On the other hand, you rarely hear of the type of failure you experienced. A well designed data center can take quite a lot of failure without a significant (or any) reduction in service level.
Maybe your provider is different, but all the data centers I've ever dealt with have multi-path redundant power routing systems. If a PDU goes out, another one takes over. They constantly share the load yet can easily take it over if one or more fail.
Add to that the standard AC-DC-AC power path and you've got a pretty rock-solid power distribution system.
Unless you can completely eliminate your single point of failure, you're going to be at risk for down-time. In fact, even with a completely redundant infrastructure, things have a bad habit of conspiring against you anyway.
All opinions presented here aren't mine.
The IT at "my" company seems to love single points of failure. Their motto seems to be "if there is a way to build a SPoF, do it". Recent examples:
The "services office" (where IT, language service, human resources and so on work) is connected through a single line to the "main office" 10 km away. One day, an excavator cut that line. Result: No one could work for hours, because each and every device including all computers and all printers use DHCP to get an IP address. And the DHCP server (and the DNS server) is located in the main office. There was a dedicated print server, but it was not allowed to work as DHCP and DNS server.
All servers in a remote office run on a single UPS. One day, yet another evil excavator cut the power line. All rooms went dark, the UPS switched to battery, all servers were running smoothly. The PBX had and still has no UPS, so only mobile phones still worked. The hotline of the local power authoritiy told us it would take some hours to get the line fixed. So we needed to shut down the servers before the UPS battery was drained. But except for one or two servers, our IT supporter had no privileges to shut down the servers, so it had to be done from the main office. But neither the ethernet switches nor the router to the main office were connected to the UPS. We finally decided that the servers had had enough time to write their caches to the disks and simply disconnected them. And no, the UPS signal output was not connected to the servers. Now, it could signal a power outage and a low battery via ethernet -- if the switches were connected to a UPS.
Did I mention that all servers in that remote office are connected to a single switch (out of three), using up to three ethernet lines?
Did I mention various air conditions that can not cope with the heat of the servers on a hot summer day?
Did I mention that all remote office data lines (yes, one line per office) end in a single point in the main office?
Did I mention that we have a single mail server (MX for the domain) at our provider for all incoming external mail which is regularily blacklisted and that our internal MX consults that black lists to fight spam?
(Hmm, I should really stop here or I won't finish until tomorrow.)
Tux2000
Denken hilft.
Up here in New England my company hooked up with an ISP (LightShip) who has provided me with 3 T1's, to 2 physical COs at no additional cost, and it is costing us LESS than our previous ISP. We mentioned redundancy to the Sales guy who got an Engineer on the line and mapped out their network, and the 2 different ways our Cisco router could be setup.
I only have one Cisco, and I know the copper shares some of the same poles, but a month after swapping, two of my T1s went down for 10 minutes when something happened at one of the COs ATM switches, but that other T1 kept me going. For us (a small non-profit) this is more than enough redundancy.
-=Down Syndrome in Maine
If it's feeding a customer service center or a bunch of bratty executives or something, well... your fucked ;) never mind what I said.
US Democracy:The best person for the job (among These pre-selected choices...)
I did a bunch of Wall Street work some years ago; we had an experience with this. The system was set up with two high-bandwidth redundant paths, leased from two big providers. (MCI and someone else, I don't remember.)
... and we found out that the physically redundant lines had been consolidated into the same trunk.
When WorldCom merged with MCI, then bought the other provider, no one thought much of it. Until a trenching machine trenched across one of the big trunks
I used to work as a network engineer for an unnamed company, and we had a redundant set of connections connecting Seattle with Chichago, and to San Jose, then from San Jose to LA, down to somewhere in Texas, and up to Virginia, up to NYC, then back to Chicago. There was a backhaul incident in Texas, and the Chicago to Seattle connection went down AND the Texas to LA went down. Go figure.
It is well known that even if at any given time you are making use of different sonet rings, circuits get shifted around based on demand, and you could end up being rerouted onto the same circuits without any notice. They only way to know is to wait till a problem occurs, and see if it impacts more than one connection.
The BACKBONE. If your provider only uses one backbone, there's still a choke point. If the backbone goes down, for whatever reason (it can happen, and has happened), you've got the same effect as being redundant at your end but not at theirs... "theirs" is just further down the line.
There are providers that have multiple backbones, from different providers. I worked for an ISP that at the time had 4 different backbone providers. While there, I saw one of the backbones fail, stay down for several days because the backbone provider dragged their feet in fixing it. Everything else kept working, though, and the only difference was that during absolute peak useage, servers were very slightly slower in responding due to the missing bandwidth.
Being redundant between you and your provider isn't enough... ask if your provider's connection is redundant as well.
Dark Nexus
"Sanity is calming, but madness is more interesting."
If you're in CA I'm guessing that SBC (Pacbell/whatever you know them as) is the local telco that provides the fiber service to your prem. I think you should be able to get diverse pathing from them. It will cost you some $$$, but is sounds like your organization is willing to pay for redundancy. They should be willing to do diverse pathing to your local CO, or diverse pathing to separate COs. You ought to be able to get strands going out of two separate conduits from your building, and completely separate conduits all the way to your local CO, or another nearby CO. You could have a CO SONET node in your closeset CO as well as a CO SONET node in a nearby CO and feed to your upstream provider from there (dunno if your upstream is PBI, which should definately do this, or another provider, who should as well). That way you can set up a healing SONET ring that will survive (in theory) a fiber cut (yes, they do happen. Even in our lovely CA :P ) or a CO outage (as long as your upstream can feed you from both COs). If you have a large enough netblock you should be able to get a connection from a second Internet provider and run BGP with them. Your problem then will be summarization at close by peering point, which is a complexity that you can get around (at a $$$ cost, of course). Just be aware that CO failures, cable cuts, and peering point failures all do happen, but you can always minimize or mostly eliminate if your organization is willing to make a dollar committment to it.
For the record, I am not an expert on this, but I have a bit of experience under my belt.
I should probably keep researching Zebra and lartc and stuff.
Well, I have a friend who is the network admin for her company and she experienced EXACTLY the same problem you did - "redundant" T1 lines running to the same CO; the gear in the CO went down (The ILEC steadfastly refuses to give details) and they lost connectivity. For 10+ hours.
Real redundancy costs real money.
I work in a professional Colo facility in Denver, and we are fully redundant in all systems. Once it leaves your box, there's two of everything. Dual power to the box, dual network connections (Backbone: Dual OC-12 lines from different providers, running to different boxes in the POP) Dual climate control systems, dual generator rooms with independent fire control systems...I could go on, but you get the idea. I'm on the graveyard shift, and things run so smoothly I get a lot of reading done. And with 150k square feet of building, a lot of walking as well.
It's not cheap. But if you really need redundancy, it's cheaper to rent space in professional facility than it is to try and be compliant without one.
I work for a small (less than 20 people) company that's branching into web stuff. Their old tech guy still does a lot of "consulting" work, since the company he now works for is often hired by this company for non-IT stuff. So we have 2 T1s to our servers in [city number 1], plus a cable connection or two. If by some weird chance the cable is out when the T-1s fail (thank you Comcast :( ), then we can migrate everything to this guys company, which has 2-3 T1s in a city about an hour's drive away. So, 2 different mediums for in-town, and a third backup that is not only with a different company, but in (obviously) another set of cables (i.e., not subleased by Company1).
I don't know much about this, except that in the company I used to own, we had two *separate* T1 lines from two *separate* ISP's.
And that is why we called it redundant lines - ensuring if one fails the other would be able to keep us alive.
I never trusted any ISP about their claims on redundancy. Perhaps it's the competition in the business segment they are into, or whatever else, I've found ISP's to :
1. Ignorant about basic concepts, or at least not in sync with customers about them
2. Lying. As harsh as this word is, I've had numerous instances where it was later discovered that they were fibbing
http://efil.blogspot.com/
"except that in the company I used to own, we had two *separate* T1 lines from two *separate* ISP's"
:)
Hah! You call that redundancy? Real redundancy is when you own TWO companies doing the same thing.
If you live in any of the following areas...
:).
# Chicago, IL
# New York, NY # Greater Boston, MA
# Greater Providence, RI # Newport, RI
# Westerly, RI
TowerStream may be something to look into. I use them as our primary connection at the office - they are far cheaper than a traditional T1 ($350/mo for 512k, $500 for 1.5mbit, they can handle around 5GBit max I believe).
True line-of-site is not required, a reflected signal is usually sufficient. An external flat-panel antenna about 6 inches tall and wide is required, however. With ours setup on the roof, we get 0% packet loss, and have had no problems through heavy snow, rain or thunderstorms.
I have occasionally had connection issues, where the wireless modem has needed to be power-cycled. I suspect, however, this is simply due to it overheating
if you want to find out about "redundancy" find out what they do in the military.
Cost is another matter....
"Provided by the management for your protection."
of of course course my my circuits circuits are are redundant redundant
It's 10 PM. Do you know if you're un-American?
I have similar experiences of things that weren't as redundant as claimed. One other point to consider is that you will need to set up a PROCESS to check this redundancy at least once per year. Otherwise some maintenance will occur which "optimises the service" - reduces cost for the operator by rerouting on a cheaper connection. This will very likely end up in the same duct, if not on the same power and router kit as the primary. First you'll know is when the digging machine down the street takes the whole site out.
Another concern - for those of us in Toronto last year when the lights went out - is the frequency with which UPS fail. Again - test regularly, rehearse regularly...
...by sourcing from different providers.
Of Instance one T1 from AT&T and one from MCI.
Power Corrupts,Absolute Power Corrupts Absolutely, leaving one person(group)in charge is absolutely corrupt.
No matter who you order from someone has to do the last mile (aka, local loop). Typically, that's the Incumbent Local Exchange Carrier (ILEC), which is normally one of the baby-bells, or whatever they've become since they've started merging back together.
You might get a line from Sprint that goes through Chicago, and another from MCI that comes from Dallas, but when they get to your town, they hand it off to the ILEC, who runs the last mile.
Even if it was hooked up to a different switch, or was terminaed at a different CO, you still have redundancy problems -- odds are, the lines come into your building at a fixed point, which could be hit by a backhoe.
I know of an ISP that was serviced directly by a CLEC (the city-run cable company pulled fibre to them, besides the copper run from the ILEC...) but they were run on the same poles, so it didn't matter.
The only really redundant systems I know of didn't use wires for one of the components. Typically, they had lines pulled to two different places, through two different COs (in once case, in bordering states, that were on different power grids), and then connected the two with microwave. This way, the second leg completely avoided the ILEC.
It's not cheap, but well, redundancy doesn't tend to be.
In the long run, you have to look at what the costs are going to be, and what sort of losses it's going to prevent, and if the additional benefits are going to outweigh the cost.
Oh -- and typically, even if a CLEC (competitive local exchange carrier) has their own switch, the last mile is still typically handled through the ILEC, which puts you back in the same boat. Even with DSL, it doesn't matter if there are two different DSLAMs, if they're routed through the same CO or SLIC.
Build it, and they will come^Hplain.
Sounds like the IT architects need some glass stomachs
Now, the only true single point of failure is the physical cabling in the street, but in CA that doesn't get damaged very often.
I'm not sure if a T1 is copper or fiber- but I doubt very much that either could withstand a separation and dislocation of concrete.
SJW: a person who perceives an injustice, and while correcting it, commits a greater injustice.
Of course almost nothing can stop a guy with a backhoe from killing it when the fibre from two different companies runs down the same path under the street... :(
"Who hasn't slipped into the break room for a quick nibble on a love Newton before?" - Mr. Peterman.
One company I used to work for used a commercial Road Runner connection to back up our corporate T3 line. It worked pretty well. One time the T3 line went down for two days and no one noticed unless they were pushing large files out to our remote sites.
It's good to use your head, but not as a battering ram.
No, that just means you left local echo enabled.
If you know enough to get diverse lines from two CO's when you buy circuits nothing says they will stay that way. You have to constantly re-prove it at the best interval you can handle.
.COM and .NET zones. Think of your office as a leaf node.
If you get diverse paths to multiple CO's those CO's may share a common backhaul to the next more metropolitan area.
In most locations you can't get lines from anybody but the local telco, and all the lines run together.
In most locations if you can get lines from different providers they run along the same poles.
Most companies (small) can't get a big enough address block to get a route.
Many ISP's won't cooperate to "help each other" for you to use the BGP route if you have a big enough company to get a network. If there are only a few ISP's in your area this is even more true.
You need to at least run lines out different ends of your buildings, preferably you should have separate buildings with different power, etc. Then if the regional power goes out you need big generators at both buildings.
See, it's super expensive to actually get real redundancy. Try turning the problem around.
Rent some server space at different data centers in different areas of the country. Use a round-robin DNS or better. Take advantage of the new fast-updates to the
It's cheaper to pay for the bandwidth back to your office than it is to go for redundancy there.
My God, it's Full of Source!
OUTSIDE_IP=$(dig +short my.ip @outsideip.net)