Redundant Internet Access?
Supp0rtLinux asks: "In order to meet uptime requirements and SLAs, we decided to get redundant T1's with BGP. We already had two Cisco 7200 routers and a T1. After the ISP turned up the additional circuit and we tested everything on our end, all seemed fine. But when the CO lost power and the generator failed, we had no access for 16+ hours. This prompted some investigations which revealed that yes, we did in fact have a redundant T1 with BGP setup and local redundant routers with separate UPS... on our side. However, on their side both our feeds were plugged into the *same* switch which was on the same PDU which happened to be in the same CO and was on the same sonet. And they were charging us for redundancy! Six month later, we have a truly redundant BGP setup. Each feed goes to separate CO's with the primary to the local one. This makes for separate physical switches, separate power, and we have confirmed we're on physically separate sonets. Now, the only true single point of failure is the physical cabling in the street, but in CA that doesn't get damaged very often. To those of you on Slashdot who know what I'm talking about: are your circuits truly redundant? What have your experiences in network redundancy been? How have you gotten past the sales guy to a tech that knows what redundancy really means? Have you been able to prove your redundancy? Have you found yourself paying for something that you weren't really getting?"
I haven't put the "on" to our redundancy just yet, but I can assure you one thing. When I do, two different companies will be providing the circuits.
Having them in two COs, redundant everything, yet linked to the same AS(when it isn't mine) makes me nervous.
Having "redundant" circuits to the same provider is pretty useless. You really need to be connected to two completely separate upstream providers for decent redundancy. If you have mission-critical needs, you want 3.
-Randy
When ordering DS1s from a telco, you generally have to specify diverse routing to get them nailed to a different CO!
-psy
Even though you have redudant circuts, that doesn't .. a few weeks ago my company suffered about 8 hours of downtime because of a MCI fuckup/russian company advertising routes for about ~100 of ATT's customers. Our systems were up the whole time but a good deal of the internet was trying to route traffic for those networks through russia.
..
mean you have redudant internet access. BGP can still fail.
That is
MCI/worldcom says it happened because a fiber was cut in ohio, which exposed a weakpoint, eventually after much escalation MCI added acls to block those routes from being advertised accross their network.
During the downtime depending on where you connected from you had up to 99% packet loss. ATT claims fully redudant OC-somethings comming into the data center(we colo with their ATT ENS branch), all it took was one little fucker on the net to advertise routes to screw us(and others) over.
not being a network guru I am not certain how the average company could defend themselves against such a problem. seems many router peering setups operate on the grounds that they trust each other to do the right thing
of course ATT doesn't have to say anything to the public, I think the government regulations say that the telecom companies must go public with info if it impacts more than 15,000 of their customers(forgot where I read that). In this case they claimed it affected less than 100. Perhaps only those located at the same datacenter as my company's stuff, a datacenter which appears to be more than 60% vacant.
So, just a second...
Both your T-1's go to the same ISP? Why are you running BGP then? You aren't gaining anything from this except for added complexity. If you're going to continue with this setup, drop the BGP and bond the T-1's together.
The only reason you would want to run BGP is if you had separate links to different ISPs. This is the best way to do it when going for added redundancy. Then if one ISP has a problem, your routes only get propagated out the other link. Keep in mind that you will probably have to play around with as-path prepending and some other things to balance your traffic properly when you do this. And keep in mind that if your total bandwidth exceeds that of one T-1, when one of them does fail, you are going to saturate the other one. If you make sure you get enough bandwidth to prevent this from happening, you won't need to play around with balancing the traffic so much either.
There are a couple of companies out there that make BGP load balancing devices that will look at the load on each of your links, and make modifications accordingly. I've never used one, and have no idea how well they work. F5 I think makes one, and there was another I looked at awhile back that cost $8k or so, but I forgot the name of it now.
But, bottom line, BGP over 2 links to the same ISP is pointless unless you have a separate path to another ISP somewhere.
Need Free Juniper/NetScreen Support? JuniperForum
Now, the only true single point of failure is the physical cabling in the street, but in CA that doesn't get damaged very often.
Let me get this straight... you're complaining that a PDU went down at your provider yet you're perfectly happy that you're running both circuits over the same cable under the street? In California?
Cables are cut all the time. Stupid things like rain water seeping through insulation take down entire city blocks. A single earthquake can disable hundreds of square miles for weeks or months.
On the other hand, you rarely hear of the type of failure you experienced. A well designed data center can take quite a lot of failure without a significant (or any) reduction in service level.
Maybe your provider is different, but all the data centers I've ever dealt with have multi-path redundant power routing systems. If a PDU goes out, another one takes over. They constantly share the load yet can easily take it over if one or more fail.
Add to that the standard AC-DC-AC power path and you've got a pretty rock-solid power distribution system.
Unless you can completely eliminate your single point of failure, you're going to be at risk for down-time. In fact, even with a completely redundant infrastructure, things have a bad habit of conspiring against you anyway.
All opinions presented here aren't mine.
If it's feeding a customer service center or a bunch of bratty executives or something, well... your fucked ;) never mind what I said.
US Democracy:The best person for the job (among These pre-selected choices...)
The BACKBONE. If your provider only uses one backbone, there's still a choke point. If the backbone goes down, for whatever reason (it can happen, and has happened), you've got the same effect as being redundant at your end but not at theirs... "theirs" is just further down the line.
There are providers that have multiple backbones, from different providers. I worked for an ISP that at the time had 4 different backbone providers. While there, I saw one of the backbones fail, stay down for several days because the backbone provider dragged their feet in fixing it. Everything else kept working, though, and the only difference was that during absolute peak useage, servers were very slightly slower in responding due to the missing bandwidth.
Being redundant between you and your provider isn't enough... ask if your provider's connection is redundant as well.
Dark Nexus
"Sanity is calming, but madness is more interesting."
A lot of companies simply cannot (due to regulation) CoLo their servers (i.e financial institutions).
This is not just due to stability/reliability concerns but mostly security; how would you feel if your banking account was housed on www.cheap-ass-pr0n-servers.biz or something like that?
Don't be fooled; any techy at a CoLo can look at your data if (s)he wants to.
Slashdot social media options: AIM, ICQ, Yahoo, Jabber and Mobile Text. Why no MySpace?
I don't know much about this, except that in the company I used to own, we had two *separate* T1 lines from two *separate* ISP's.
And that is why we called it redundant lines - ensuring if one fails the other would be able to keep us alive.
I never trusted any ISP about their claims on redundancy. Perhaps it's the competition in the business segment they are into, or whatever else, I've found ISP's to :
1. Ignorant about basic concepts, or at least not in sync with customers about them
2. Lying. As harsh as this word is, I've had numerous instances where it was later discovered that they were fibbing
http://efil.blogspot.com/
You are wrong.
Many banks do run in Co-los. We have neighbors in the co-los we are in that are banks, insurance companies, medical, etc. And I would feel very comfortable with my bank locating in some of the co-los that we are in.
Case in point are co-los with "real" security. Savvis (formerly Exodus) in Los Angeles (actually El Segundo for those that care) has armed guards, card key access, hand scanners, more security cameras than you can count, and man traps. If you need more, you can get private cages, rooms, and even bomb-proof vaults. And if you know of a "regulation" that states that your internet-connected servers cannot be located in a secured facility with bonded armed guards, I would like to read that reg. Personally, I think your statement is just a continuation of the "myth" that co-los are somehow less secure. While no place is 100% perfect, co-los are much "more secure" than the back room at your office.
Remember, that when I say co-lo, I don't mean "cheap-ass-servers.com" (someone quick, go register this as it appears to be available). There are many high-quality co-los from companies like Savvis, Switch and Data, and others that are run extremely professionally. If you are lucky enough to be in a good co-lo, you will get better up-time than you could possibly hope for in-house.
No matter who you order from someone has to do the last mile (aka, local loop). Typically, that's the Incumbent Local Exchange Carrier (ILEC), which is normally one of the baby-bells, or whatever they've become since they've started merging back together.
You might get a line from Sprint that goes through Chicago, and another from MCI that comes from Dallas, but when they get to your town, they hand it off to the ILEC, who runs the last mile.
Even if it was hooked up to a different switch, or was terminaed at a different CO, you still have redundancy problems -- odds are, the lines come into your building at a fixed point, which could be hit by a backhoe.
I know of an ISP that was serviced directly by a CLEC (the city-run cable company pulled fibre to them, besides the copper run from the ILEC...) but they were run on the same poles, so it didn't matter.
The only really redundant systems I know of didn't use wires for one of the components. Typically, they had lines pulled to two different places, through two different COs (in once case, in bordering states, that were on different power grids), and then connected the two with microwave. This way, the second leg completely avoided the ILEC.
It's not cheap, but well, redundancy doesn't tend to be.
In the long run, you have to look at what the costs are going to be, and what sort of losses it's going to prevent, and if the additional benefits are going to outweigh the cost.
Oh -- and typically, even if a CLEC (competitive local exchange carrier) has their own switch, the last mile is still typically handled through the ILEC, which puts you back in the same boat. Even with DSL, it doesn't matter if there are two different DSLAMs, if they're routed through the same CO or SLIC.
Build it, and they will come^Hplain.
Yeah, what he said.
I used to be a network engineer at a large co-lo company which was acquired by Cable and Wireless after going through Chapter 11.
The data center in which I worked had a different take on man traps. They looked very much like a Star Trek transporter, and like the transporters, were temperamental and at least one of them was frequently out of order entirely. This was bad because they were made by an Italian company and every time one of them broke, a service tech would have to fly out *from Italy* to fix the piece of crap. One of them was once down for almost two weeks because they didn't have the part it needed. It was quite common for people to get stuck in them and have to be let out by the guards. It happened to me several times.
They worked by having a convex sliding door on each side of a floor to ceiling Plexiglas cylinder. Inside it a card reader for your badge and a biometric reader that you put your hand on. If both match up (and the fscking thing doesn't break!) the door on the other side opens. Both doors cannot be open at once, so you have to wait for the first door to close before it even lets you wave your badge at it and have your palm read.
You couldn't steal anything much via one of those, since even getting a 2U server through the tiny things required holding it between your legs. Otherwise it would think two people were in there at once and refuse to open the other door. Anytime it doesn't open, whether by design or all-too-common breakdown, it sets off an alarm at the guard station and the guards have to come and let you out.
If the things are broken, the guards can open an alarmed door (also used to take large piles of gear into the data center and go up the freight elevator), and no one could steal anything that way either because they can see anything you've got. You have to fill out paperwork on anything equipment your are bringing in or taking out, with description and serial number. There's an audit trail on anything anyone does - even employees - in the colo space.
After you get out of the transporter, if your atoms haven't been scattered halfway across the universe, you go to a secure elevator which again requires your badge to operate. It will go only to the floor your badge is authorized for, and after you get out of the elevator you have to use your badge to also open a door.
Then, you finally get to your cage, which you can open with the key your signed out from the guard station when you checked in. At the guard station, you need to be on the authorized list for your company, and you need photo ID or you don't get in.
It was pretty safe and pretty secure.
I have also been inside one of the data centers of a large, well-known investment bank. It was far less secure than the colo data center where I worked. For starters, all I had to do to get in was be in the company of a senior sysadmin who worked there and had 24 x 7 building access. He signed me in at the door to the building (which was not a dedicated data center; it was a regional headquarters in which the data center was housed), and I didn't even have to show ID. And it was on a *weekend* when the place was deserted. There were no security checks after that. None. He just swiped his badge at the computer room door and took me in for a tour. I never should have been allowed in there at all, let alone with no one even checking to see who I was or anything.
Granted, this regional headquarters was not in the United States, nor is it a US bank, so they may have different regulations, but I'd be surprised indeed if there were any regulations stating that a bank cannot use a colo facility. I also used to work for an American bank, in its main data center. The security was a lot better than what I just described above, but not as good (not even close!) as at the security at the colo data center where I worked. The network connections to that place also went by not one, not two, but *five* different carriers. It wou