Slashdot Mirror


Redundant Internet Access?

Supp0rtLinux asks: "In order to meet uptime requirements and SLAs, we decided to get redundant T1's with BGP. We already had two Cisco 7200 routers and a T1. After the ISP turned up the additional circuit and we tested everything on our end, all seemed fine. But when the CO lost power and the generator failed, we had no access for 16+ hours. This prompted some investigations which revealed that yes, we did in fact have a redundant T1 with BGP setup and local redundant routers with separate UPS... on our side. However, on their side both our feeds were plugged into the *same* switch which was on the same PDU which happened to be in the same CO and was on the same sonet. And they were charging us for redundancy! Six month later, we have a truly redundant BGP setup. Each feed goes to separate CO's with the primary to the local one. This makes for separate physical switches, separate power, and we have confirmed we're on physically separate sonets. Now, the only true single point of failure is the physical cabling in the street, but in CA that doesn't get damaged very often. To those of you on Slashdot who know what I'm talking about: are your circuits truly redundant? What have your experiences in network redundancy been? How have you gotten past the sales guy to a tech that knows what redundancy really means? Have you been able to prove your redundancy? Have you found yourself paying for something that you weren't really getting?"

20 of 78 comments (clear)

  1. Not there yet by perlchild · · Score: 5, Informative

    I haven't put the "on" to our redundancy just yet, but I can assure you one thing. When I do, two different companies will be providing the circuits.

    Having them in two COs, redundant everything, yet linked to the same AS(when it isn't mine) makes me nervous.

    1. Re:Not there yet by Asgard · · Score: 4, Interesting

      Beware, I recall a story about how redundant lines were leased from two different companies, only to find that they both leased their lines from the same company and it was all contained in the same conduit .

  2. Actual conversation by BrynM · · Score: 4, Funny
    This is what I overheard when the place I worked at years ago was shopping disaster recovery sites. Mind you, this was for a mainframe - this place was supposed to be fully redundant in about 20 other ways as well.

    Boss: We need redundant connectivity and power.
    Sales-Goof: You can have as many people open browsers on as many computers as you want.

    For comparison and not a plug, when my boss asked the IBM guy, he pulled out charts and wiring diagrams to explain what they had.

    --
    US Democracy:The best person for the job (among These pre-selected choices...)
  3. On a Wing and a Prayer by orthogonal · · Score: 5, Funny

    To those of you on Slashdot who know what I'm talking about: are your circuits truly redundant? What have your experiences in network redundancy been?

    I have two homing pigeons.

    If Cupid smiles on them, soon I'll have even more redundancy.

  4. Very concerned by invisik · · Score: 4, Interesting

    I worked at a place that was running redundant T1's just as you describe. They might as well have had all the wires running together the whole way.

    My issues from there:

    1. How do you convince an ISP to bring a feed in from another CO? Distance is a huge problem--they don't want to run it.

    2. How do you know what the ISP has on their end, UPS's, generators, etc? Should that be part of the SLA? Or should you demand a tour of their facilities to see where your wire goes?

    3. How can you coordinate two seperate ISP's for automatic redundancy? I suppose with a LinkProof box or something. And how do you know they aren't coming through the same telco CO?

    4. Should you pay to have them manage the lines and router configurations in a 24/7 scenario? Or does it work well enough to have them do the initial install and then let it run?

    5. Finally, what's a resonable cost for this redundancy?

    I have some more projects that will be requiring this type of setup. Am interested to hear any opinions and recomendations from experience from fellow slashdotters......

    Thanks much!

    -m

    --
    http://www.invisik.com
    1. Re:Very concerned by duffbeer703 · · Score: 4, Insightful

      The local telco will lie their asses off and charge you insanely expensive rates for mediocre service.

      Unless you're in a downtown area or a tech park, forget about redundancy.

      IMHO, anything facing the public that needs redundancy belongs in a colo.

      --
      Conformity is the jailer of freedom and enemy of growth. -JFK
  5. 2 to the same provider is not redundant by ZESTA · · Score: 5, Informative

    Having "redundant" circuits to the same provider is pretty useless. You really need to be connected to two completely separate upstream providers for decent redundancy. If you have mission-critical needs, you want 3.

    -Randy

  6. Another completely different approach by DDumitru · · Score: 4, Insightful

    My personal opinion is that trying to reach this level of redundancy for a lot of companies is just not practical and that there are much better approaches.

    The idea here is to think of your internet connectivity as two different classes of services. You should place your internet reachable servers in a good co-lo. Get BGP lines from two different sources and multi-home the boxes. Don't run your own AS (use the upstreams space) but instead place your servers "close" to your provider's edge routers. In the end, you are BPGing the loop and it is hard for 100ft of cat-5 to fail. In the end, you have to ask yourself "Am I more qualified to keep my BPG up than is Level-3 (or Savvis ... or AT&T ... or MCI ... or Sprint ... or Cogent)".

    In terms of your office, stick to client-only type services. Get two "diverse" connections. This might be a T-1 and a DSL, or a DSL and a cable modem. By using completely different architechures, you can get incredible diversity without spending a bunch of money. You can then IPSEC your local net over the client-only connection back to your addresses in the co-lo and with the help of a little client-side monitoring, auto-switch when a line goes down.

    We offer something similar as a part of our hosting offering for users with green-screen (telnet, serial terminal) applications. A client gateway application manages logical "connections" back to our multi-homed central servers walking around BPG router "flaps" and other transient outages that BGP does not even address.

    1. Re:Another completely different approach by gujo-odori · · Score: 3, Informative

      Yeah, what he said.

      I used to be a network engineer at a large co-lo company which was acquired by Cable and Wireless after going through Chapter 11.

      The data center in which I worked had a different take on man traps. They looked very much like a Star Trek transporter, and like the transporters, were temperamental and at least one of them was frequently out of order entirely. This was bad because they were made by an Italian company and every time one of them broke, a service tech would have to fly out *from Italy* to fix the piece of crap. One of them was once down for almost two weeks because they didn't have the part it needed. It was quite common for people to get stuck in them and have to be let out by the guards. It happened to me several times.

      They worked by having a convex sliding door on each side of a floor to ceiling Plexiglas cylinder. Inside it a card reader for your badge and a biometric reader that you put your hand on. If both match up (and the fscking thing doesn't break!) the door on the other side opens. Both doors cannot be open at once, so you have to wait for the first door to close before it even lets you wave your badge at it and have your palm read.

      You couldn't steal anything much via one of those, since even getting a 2U server through the tiny things required holding it between your legs. Otherwise it would think two people were in there at once and refuse to open the other door. Anytime it doesn't open, whether by design or all-too-common breakdown, it sets off an alarm at the guard station and the guards have to come and let you out.

      If the things are broken, the guards can open an alarmed door (also used to take large piles of gear into the data center and go up the freight elevator), and no one could steal anything that way either because they can see anything you've got. You have to fill out paperwork on anything equipment your are bringing in or taking out, with description and serial number. There's an audit trail on anything anyone does - even employees - in the colo space.

      After you get out of the transporter, if your atoms haven't been scattered halfway across the universe, you go to a secure elevator which again requires your badge to operate. It will go only to the floor your badge is authorized for, and after you get out of the elevator you have to use your badge to also open a door.

      Then, you finally get to your cage, which you can open with the key your signed out from the guard station when you checked in. At the guard station, you need to be on the authorized list for your company, and you need photo ID or you don't get in.

      It was pretty safe and pretty secure.

      I have also been inside one of the data centers of a large, well-known investment bank. It was far less secure than the colo data center where I worked. For starters, all I had to do to get in was be in the company of a senior sysadmin who worked there and had 24 x 7 building access. He signed me in at the door to the building (which was not a dedicated data center; it was a regional headquarters in which the data center was housed), and I didn't even have to show ID. And it was on a *weekend* when the place was deserted. There were no security checks after that. None. He just swiped his badge at the computer room door and took me in for a tour. I never should have been allowed in there at all, let alone with no one even checking to see who I was or anything.

      Granted, this regional headquarters was not in the United States, nor is it a US bank, so they may have different regulations, but I'd be surprised indeed if there were any regulations stating that a bank cannot use a colo facility. I also used to work for an American bank, in its main data center. The security was a lot better than what I just described above, but not as good (not even close!) as at the security at the colo data center where I worked. The network connections to that place also went by not one, not two, but *five* different carriers. It wou

  7. Diverse routing by psyconaut · · Score: 3, Informative

    When ordering DS1s from a telco, you generally have to specify diverse routing to get them nailed to a different CO!

    -psy

  8. wait by austad · · Score: 3, Informative

    So, just a second...

    Both your T-1's go to the same ISP? Why are you running BGP then? You aren't gaining anything from this except for added complexity. If you're going to continue with this setup, drop the BGP and bond the T-1's together.

    The only reason you would want to run BGP is if you had separate links to different ISPs. This is the best way to do it when going for added redundancy. Then if one ISP has a problem, your routes only get propagated out the other link. Keep in mind that you will probably have to play around with as-path prepending and some other things to balance your traffic properly when you do this. And keep in mind that if your total bandwidth exceeds that of one T-1, when one of them does fail, you are going to saturate the other one. If you make sure you get enough bandwidth to prevent this from happening, you won't need to play around with balancing the traffic so much either.

    There are a couple of companies out there that make BGP load balancing devices that will look at the load on each of your links, and make modifications accordingly. I've never used one, and have no idea how well they work. F5 I think makes one, and there was another I looked at awhile back that cost $8k or so, but I forgot the name of it now.

    But, bottom line, BGP over 2 links to the same ISP is pointless unless you have a separate path to another ISP somewhere.

    --
    Need Free Juniper/NetScreen Support? JuniperForum
  9. The system at one of my previous place of employ by YankeeInExile · · Score: 3, Interesting

    We had four T1s -- two from MFS and two from Bell. Of the four T1s, two (one MFS and one Bell) went to one NSP in Santa Clara, and the other two went to a different vendor in Oakland.

    We even had physical plant diversity -- the Bell loops came from cable that ran along Stevens Creek Blvd, and the MFS fiber came up from the street that ran behind us. Outside of the building burning down, we were bulletproof.

    Ran three years without a single minute of downtime.

    My crowning glory in network design. Never again did I work for an employer who was willing to put their money where there mouth was for reliability.

    --
    How does the Slashdot Effect happen given that no slashdotters ever RTFA?
  10. Wait a second... by AlphaOne · · Score: 3, Informative

    Now, the only true single point of failure is the physical cabling in the street, but in CA that doesn't get damaged very often.

    Let me get this straight... you're complaining that a PDU went down at your provider yet you're perfectly happy that you're running both circuits over the same cable under the street? In California?

    Cables are cut all the time. Stupid things like rain water seeping through insulation take down entire city blocks. A single earthquake can disable hundreds of square miles for weeks or months.

    On the other hand, you rarely hear of the type of failure you experienced. A well designed data center can take quite a lot of failure without a significant (or any) reduction in service level.

    Maybe your provider is different, but all the data centers I've ever dealt with have multi-path redundant power routing systems. If a PDU goes out, another one takes over. They constantly share the load yet can easily take it over if one or more fail.

    Add to that the standard AC-DC-AC power path and you've got a pretty rock-solid power distribution system.

    Unless you can completely eliminate your single point of failure, you're going to be at risk for down-time. In fact, even with a completely redundant infrastructure, things have a bad habit of conspiring against you anyway.

    --
    All opinions presented here aren't mine.
  11. Redun-what? by Tux2000 · · Score: 3, Funny

    The IT at "my" company seems to love single points of failure. Their motto seems to be "if there is a way to build a SPoF, do it". Recent examples:

    The "services office" (where IT, language service, human resources and so on work) is connected through a single line to the "main office" 10 km away. One day, an excavator cut that line. Result: No one could work for hours, because each and every device including all computers and all printers use DHCP to get an IP address. And the DHCP server (and the DNS server) is located in the main office. There was a dedicated print server, but it was not allowed to work as DHCP and DNS server.

    All servers in a remote office run on a single UPS. One day, yet another evil excavator cut the power line. All rooms went dark, the UPS switched to battery, all servers were running smoothly. The PBX had and still has no UPS, so only mobile phones still worked. The hotline of the local power authoritiy told us it would take some hours to get the line fixed. So we needed to shut down the servers before the UPS battery was drained. But except for one or two servers, our IT supporter had no privileges to shut down the servers, so it had to be done from the main office. But neither the ethernet switches nor the router to the main office were connected to the UPS. We finally decided that the servers had had enough time to write their caches to the disks and simply disconnected them. And no, the UPS signal output was not connected to the servers. Now, it could signal a power outage and a low battery via ethernet -- if the switches were connected to a UPS.

    Did I mention that all servers in that remote office are connected to a single switch (out of three), using up to three ethernet lines?

    Did I mention various air conditions that can not cope with the heat of the servers on a hot summer day?

    Did I mention that all remote office data lines (yes, one line per office) end in a single point in the main office?

    Did I mention that we have a single mail server (MX for the domain) at our provider for all incoming external mail which is regularily blacklisted and that our internal MX consults that black lists to fight spam?

    (Hmm, I should really stop here or I won't finish until tomorrow.)

    Tux2000

    --
    Denken hilft.
  12. After some thought... by BrynM · · Score: 3, Informative
    I was thinking about your situation some more... Would it be too much to just move some of your redundancy off-site? If it's server availability to the internet that you need, it can be done with some work. Rent some space at a colo and put your stuff up there. Traceroute the connection from your current site to whatever colos you check out _and_ as them about their upstream provider.

    If it's feeding a customer service center or a bunch of bratty executives or something, well... your fucked ;) never mind what I said.

    --
    US Democracy:The best person for the job (among These pre-selected choices...)
  13. Physical redundancy by crmartin · · Score: 3, Funny

    I did a bunch of Wall Street work some years ago; we had an experience with this. The system was set up with two high-bandwidth redundant paths, leased from two big providers. (MCI and someone else, I don't remember.)

    When WorldCom merged with MCI, then bought the other provider, no one thought much of it. Until a trenching machine trenched across one of the big trunks ... and we found out that the physically redundant lines had been consolidated into the same trunk.

  14. The one thing missing.... by Dark+Nexus · · Score: 4, Informative

    The BACKBONE. If your provider only uses one backbone, there's still a choke point. If the backbone goes down, for whatever reason (it can happen, and has happened), you've got the same effect as being redundant at your end but not at theirs... "theirs" is just further down the line.

    There are providers that have multiple backbones, from different providers. I worked for an ISP that at the time had 4 different backbone providers. While there, I saw one of the backbones fail, stay down for several days because the backbone provider dragged their feet in fixing it. Everything else kept working, though, and the only difference was that during absolute peak useage, servers were very slightly slower in responding due to the missing bandwidth.

    Being redundant between you and your provider isn't enough... ask if your provider's connection is redundant as well.

    --
    Dark Nexus
    "Sanity is calming, but madness is more interesting."
  15. TowerStream by shadowxtc · · Score: 4, Interesting

    If you live in any of the following areas...

    # Chicago, IL
    # New York, NY # Greater Boston, MA
    # Greater Providence, RI # Newport, RI
    # Westerly, RI

    TowerStream may be something to look into. I use them as our primary connection at the office - they are far cheaper than a traditional T1 ($350/mo for 512k, $500 for 1.5mbit, they can handle around 5GBit max I believe).

    True line-of-site is not required, a reflected signal is usually sufficient. An external flat-panel antenna about 6 inches tall and wide is required, however. With ours setup on the roof, we get 0% packet loss, and have had no problems through heavy snow, rain or thunderstorms.

    I have occasionally had connection issues, where the wireless modem has needed to be power-cycled. I suspect, however, this is simply due to it overheating :).

  16. Joking and Seriously by 4of12 · · Score: 3, Interesting

    if you want to find out about "redundancy" find out what they do in the military.

    Cost is another matter....

    --
    "Provided by the management for your protection."
  17. The problem is the ILEC. by oneiros27 · · Score: 4, Informative

    No matter who you order from someone has to do the last mile (aka, local loop). Typically, that's the Incumbent Local Exchange Carrier (ILEC), which is normally one of the baby-bells, or whatever they've become since they've started merging back together.

    You might get a line from Sprint that goes through Chicago, and another from MCI that comes from Dallas, but when they get to your town, they hand it off to the ILEC, who runs the last mile.

    Even if it was hooked up to a different switch, or was terminaed at a different CO, you still have redundancy problems -- odds are, the lines come into your building at a fixed point, which could be hit by a backhoe.

    I know of an ISP that was serviced directly by a CLEC (the city-run cable company pulled fibre to them, besides the copper run from the ILEC...) but they were run on the same poles, so it didn't matter.

    The only really redundant systems I know of didn't use wires for one of the components. Typically, they had lines pulled to two different places, through two different COs (in once case, in bordering states, that were on different power grids), and then connected the two with microwave. This way, the second leg completely avoided the ILEC.

    It's not cheap, but well, redundancy doesn't tend to be.

    In the long run, you have to look at what the costs are going to be, and what sort of losses it's going to prevent, and if the additional benefits are going to outweigh the cost.

    Oh -- and typically, even if a CLEC (competitive local exchange carrier) has their own switch, the last mile is still typically handled through the ILEC, which puts you back in the same boat. Even with DSL, it doesn't matter if there are two different DSLAMs, if they're routed through the same CO or SLIC.

    --
    Build it, and they will come^Hplain.