Slashdot Mirror


ISP Recovers in 72 Hours After Leveling by Tornado

aldheorte writes "Amazing story of how an ISP in Jackson, TN, whose main facility was completely leveled by a tornado, recovered in 72 hours. The story is a great recounting of how they executed their disaster recovery plan, what they found they had left out of that plan, data recovery from destroyed hard drives, and perhaps the best argument ever for offsite backups. (Not affiliated with the ISP in question)"

78 of 258 comments (clear)

  1. Heh by B3ryllium · · Score: 4, Funny

    Hopefully no one was hurt when the trailer park got levelled.

  2. It will be a damn shame... by Anonymous Coward · · Score: 3, Funny

    when Munchkins overrun the web now that this ISP got relocated by the twister.

  3. That's fricking awesome by Anonymous Coward · · Score: 5, Funny

    "So, ah, your ISP here.. what's your uptime for the last year?"

    "99.18% for our service, and 96.2% for our building."

  4. Poor tech support by dswensen · · Score: 5, Funny

    And I'm sure every minute of those 72 hours was characterized by irate phone calls to tech support.

    "Are you guys down again? You're down more than you're up! I'm going to find another service... etc..."

    "Ma'am our facilities have been entirely leveled by a tornado, we'll be back up in 72 hours."

    "72 HOURS?! I have photos of my grandchildren I have to mail! Worst ISP ever! Let me speak to your supervisor!"

    "Ma'am our supervisor was also leveled by the tornado."

    *click*

    Not that I work tech support for an ISP and am bitter...

    1. Re:Poor tech support by koa · · Score: 4, Insightful

      Actually.. I ran a technical support department for a small ISP for a couple years.

      It amazing how accurate you are in reguards to customer viewpoint on downtime.

      After having done it myself, I actually have MUCH more respect for technicul support engineers/supervisors becuase within reason most "downtime" is fixed even before the customer knows about it (i.e. small blips in service).

      And the majority of people who purchase an ISP's services have absolutely no idea what it takes to respond to an outtage.

      --
      ....move along....nothing to see here....
    2. Re:Poor tech support by dswensen · · Score: 2, Interesting

      I'm sorry to hear you've run into such rude tech support. Around here, we're polite enough up until the customer starts copping a serious attitude.

      That said, we get dozens of calls a day accusing us (not asking politely, as characterized by your post) of having downtime when, in fact, the problem is on the client's side. I have outright been called a liar when I say our ISP is not "down." When there actually is an outage (which is rare, but happens), it's much worse.

      Then, we are "always" down, and have had "dozens" of outages in the past week, and etc. etc. (usually this is a customer running Win 95 with an antiquated HSP modem who lives in the sticks and has a 400-foot phone cord going from his computer to the phone jack in the barn, but... nevermind. We're "always down.")

      So yes, when you have hundreds of callers a day telling you you are ruining their business and costing them thousands of dollars and raping their grandchildren, patience sometimes runs a little thin. Because so many customers open with "are you guys down AGAIN?" rather than describing their problem, sometimes techs can get a little terse.

      Nonetheless, if any of the techs here spoke to a customer the way you're characterizing it, he would pretty much be fired on the spot.

  5. Users need their porn! by Trigun · · Score: 2, Insightful

    Now that that's out of the way, it never ceases to amaze me how many companies have little to no severe disaster recovery plans, and how a little bit of ingenuity(sp?) can go a long way in a company.
    Times of crisis and how one deals with them are the mark of successful businesses/employees/people. I don't think that we could recover so quickly should a disaster of that size hit my job, but it'd be fun to try.

  6. Nice work! by Tebriel · · Score: 4, Insightful

    This is what happens when people make intelligent plans and the modify them as they see other plans work or fail. I'm glad to see that this was a work in progress rather than some arcane plan in a binder somewhere that no one ever looked at.

    --
    The Blaster Master Fighting for Truth, Justice, and Evil Pie since 1979
    1. Re:Nice work! by blackp · · Score: 3, Insightful

      One of the problems with a plan in a binder somewhere, is that the tornado would have probably taken out the binder as well.

    2. Re:Nice work! by Mooncaller · · Score: 2, Interesting
      Right on! I have been involved with the design of at least 5 disaster recovery plans. The first one was while I was in the Air Force. I guess I was pretty luck, learning at the age of 19, that preparing for successfull disaster recovery is a continous process. The main output of a disaster recovery development process, are those binders. I guess thats why so many people confuse disaster recovery with the binders. But they are only the result of a process; just like a piece of software is created as part of a process. And like software, the binders need to be tested and reviewed regularly.

      Some people learn by reading.

      Some people learn by observing.

      And some people have to piss on the electric fence for themselves.

  7. Wow by JCMay · · Score: 2

    That is some very well thought out planning. Big props to those guys!

  8. Elephant Insurance by Bob+Vila's+Hammer · · Score: 5, Funny

    When your business gets pelted with the equivalent force of 100,000 elephants, you better have a friggin contingency plan.

    --


    --"The perfect example of the man of action is the suicide." - William Carlos Williams
  9. one comment and it's gone . . mirror by GlassUser · · Score: 2, Redundant

    Twisters, hurricanes, floods (oh my)

    SEPTEMBER 03, 2003 ( CIO ) - The evening of Sunday, May 4, 2003, at Aeneas Internet and Telephone began as any previous Sunday evening had. The Jackson, Tenn.-based company that serves about 10,000 Internet and 2,500 telephone customers was closed for the weekend, awaiting the return of its 17 employees the next morning. Just before midnight, however, all hell broke loose. An F-4 category twister touched down just outside of town, then tore through Jackson's downtown area, leveling houses, historical sites and municipal buildings alike. The tornado ripped straight through Aeneas's one-story building, leaving only a pile of rubble.
    Meanwhile, Aeneas CIO and Operations Manager Josh Hart, who'd heard about multiple tornadoes in the area that day, was home, 52 miles away in Martin, Tenn., huddling in his bathroom with his family. As soon as he was able, he flipped on the TV for news footage of the devastation. What he saw looked like "a war zone," bricks and concrete everywhere and piles upon piles of rubble.

    At 2 a.m., with those images in the background, Hart's cell phone rang--it was Aeneas Network Administrator Jason Warren calling from what he likened to Ground Zero to report that everything in Jackson was lost. Another call came in from CEO Jonathan Harlan.

    "I'm listening to [Warren] tell me what it's like, and he says, 'It doesn't even look like there was an office here,'" remembers Hart, 25. "The tornado destroyed our computers, our desks, everything. I couldn't believe what he was telling me."

    Aeneas lost nearly $1 million in hardware and software that night, and an estimated 72 hours of downtime. But just as Aeneas in Virgil's Aeneid endured the worst the gods had to offer, so too did this Aeneas. This one, however, was wise enough to have created a contingency plan--one that minimized the damage and kept the company afloat during its darkest hour.

    The company is not alone. After a nationwide scramble to prepare for high-impact, low-probability events similar to the attacks of Sept. 11, CIOs have since realized that their organizations are far more likely to succumb to another type of event--one that has a high probability of occurring and, curiously enough, is probably simpler to predict: the weather. For example, in June, while the Atlantic seaboard was bracing for the start of hurricane season, Arizona was busy battling forest fires. And in Harris County, Texas, in 2001, a tropical storm and resulting flood taught one IT executive the importance of flexibility.

    Both Aeneas's Hart and Steven W. Jennings, Harris County's executive director of central technology, share their experiences here in an effort to provide best practices and battle-tested secrets about which preparations work best. According to Carol Kelly, vice president of government strategies for Meta Group, these are lessons from which everyone can learn. "When disaster strikes, you want to be ready with a plan of action and an approach of how to deal," she says. "You might be ready for the next terrorist attack, but if you're not ready for the next nor'easter, your plans won't amount to much."

    Big plans for a small company

    Aeneas launched its contingency plan when it was founded in 1996; since then, CIO Hart has enhanced the strategy gradually almost every year. In early 2002, as the ISP neared 10,000 Internet customers, he and his network administrator, Warren, thought up the company's most comprehensive approach yet. While they determined that the likelihood of a terrorist attack on the western Tennessee town of Jackson, population 59,600, was slim to none, they concluded that because of the municipality's location in the central U.S.'s infamous Tornado Alley, the plan should respond to the next most likely cause of disaster--twisters. What ensued was a three-pronged plan that hinged upon colocation, distribution and backups.

    First, by employing Border Gateway Protocol (BGP) programming on a high-class circuit shared with an ISP 90 miles

  10. Fire... by Shut+the+fuck+up! · · Score: 5, Insightful

    ...is a good enough argument for off site backups. If you don't have them, your backup plan is not enough.

    1. Re:Fire... by Stargoat · · Score: 2, Insightful

      Everyone should have off-site backups. It's not very expensive (>100 dollars for tapes). It's not very hard (drive tapes to site). It's not difficult to get the backups if you need them (drive to site with tapes). It just makes sense.

      --
      Hoist Number One and Number Six.
    2. Re:Fire... by Zathrus · · Score: 5, Insightful

      Everyone should have off-site backups. It's not very expensive (>100 dollars for tapes)

      Er, for how much data? For your personal computer, maybe (but the tape drive will cost you considerably more than that $100), but I don't think you're going to back up a few hundred gigs of business data on ~$100 of tapes. And I suspect you meant 100... although if the latter then you're almost certainly correct!

      It's not very hard (drive tapes to site). It's not difficult to get the backups if you need them (drive to site with tapes)

      If your offsite backup is within convienent driving distance then odds are it's not far enough offsite. A flood, tornado, hurricane, earthquake, or other large scale natural disaster could conceivably destroy both your onsite and offsite backups if they're within a few miles. The flipside is that the further the distance the more the inconvienence on an ongoing basis and the more likely you are to stop doing backups.

      There's far more to be considered here, but I'm not the DR expert (my wife is... seriously). It does make sense to have offsite backups, but you have to have some sense about those too.

    3. Re:Fire... by Zathrus · · Score: 2, Interesting

      If a vault is destroyed in the same disaster, then there are probably more important things to worry about.

      If the vault is destroyed, then you're probably right. But it doesn't take that to render the data unusable -- if the bank gets hit, the vault may survive but the keys may be destroyed (yeah, I'm sure they can get more made or have a locksmith come in, but that will take time). Or the vault is inaccessible for some amount of time due to damage. Even if the data is good, having it unavailable does you no good at all.

      The data backup services are good, as is just going a bit further afield for a safe deposit box or other repository. As you say, if the data is important you do what it takes.

  11. And then gets slashdotted by Anonymous Coward · · Score: 2, Funny

    Hah, they can recover from a tornado. That's no biggie. How 'bout a SLASHDOTTING, then!

    1. Re:And then gets slashdotted by cindik · · Score: 4, Interesting

      That's actually interesting - how many sites have contingency plans for the /. effect? How many businesses? It's not just /., but just about any media can refer people to a real business site. For small companies, this could bring them down for some time. Imagine the "Bruce Almighty" effect, only with some business with a small-to-medium capacity connection, bombarded just because someone used http://www.slashdotme.com/ or spam@.me.into.oblivion.org in their movie. The fact that so many sites are taken down by the /. effect causes me to believe that few sites and those who run them are truly prepared.

    2. Re:And then gets slashdotted by Fishstick · · Score: 4, Informative

      that's computerworld receiving the /.ing

      the isp is here

      picture of the aftermath here

      --

      There is much cruelty in the universe, John.
      Yeah, we seem to have the tour map.

    3. Re:And then gets slashdotted by ceije · · Score: 3, Informative


      I think a lot of sites already have contingency plans for sudden traffic increases, and if not, they begin to think about them very seriously once they get a large spike in traffic that causes disruption of service. Even with traffic spike contingency plans, the level you establish as the maximum amount of traffic that you need to be able to sustain, and what amount of latency or down time is acceptable to business, can be and often is debated ad nauseum. It costs a lot of money to maintain readiness for, say, double or triple normal site traffic for a large site, and you have to make a business case for balancing that cost with the cost of an outage due to increased traffic.

      There are several things you can do to quickly add the capability to handle additional load, and most of them rely on forethought when establishing contracts with your colocation facilities and software/hardware vendors. For instance, most large colo facilities allow you to reserve additional bandwidth capability. You may pay more for that priviledge, but that's part of the cost of preparedness. Also, you may purchase or lease additional hardware, have it set up and ready to install in a short amount of time, but not use it on a regular basis because of high licensing costs.

      Licensing costs for database software can be enormous, but in the event of a large spike in traffic, turning on an additional 20 or 30 cpus on a large database server could save the company a lot of money in lost revenues. Especially if you database software vendor specifically allows this in your contract. If the contract doesn't allow this, you may end up paying a lot more in licensing fees than you would have made in revenue during the outage.

      My main point here is that planning for extra traffic is a big cost-benefit balancing act, and it requires a lot of forethought. Most large software, hardware and service providers allow for emergency clauses in contractual agreements, but it's often up to the customer to specifically call those out.

      But then again, it's like insurance. You hope you don't need it, but you're glad you have it when you do. And you have to pay for it even if you don't need it.

      Also, when you plan for traffic spike, you need to consider the source of the traffic. Denial of service attacks are often easy to mitigate with common network practices, and it's just a matter of preparing for those. But real, human-driven traffic is much different, less predictable, and actually capable of generating revenue.

      Understanding your company's site infrastructure, software architecture and day-to-day traffic patterns is very important when it comes to handling real traffic spikes. When a real spike happens, network operators, developers and database admins (among others), will probably need to jump into action, looking for and attempting to mitigate bottlenecks as they appear. This can be a difficult task, and there's nothing worse than knowing what the problem is and not being able to do anything effective to combat it in a reasonable amount of time.

      Real traffic doesn't just come from other sites, it can also be driven by other forms of communication, such as television, print and other media... even word of mouth (although I haven't seen an example of this). A large, syndicated national television news program that runs during primetime can generate a lot more traffic than most web sites, and those spikes seem to grow on orders of magnitude as the duration and repetition of air time increases. A fifteen minute segment that is marginally compelling might be enough to swamp all but the largest and most prepared sites. The silver lining of the television spike is that it declines very quickly after the segment ends.

      A spike from multiple media sources, for instance print, web, and television, could be very difficult to handle, both in magnitude and duration. Although, duration isn't often a problem, because even the most prepared sites will succumb under a huge spike and

  12. Because of a tornado... by Doesn't_Comment_Code · · Score: 3, Funny

    A Tornado huh?

    Well that's what you casemodders get for installing twenty overpowered cooling fans in every one of your 1000 servers!

    --

    Slashdot Syndrome: the sudden, extreme urge to correct someone in order to validate one's self.
  13. Tornad'oh! by AtariAmarok · · Score: 4, Funny

    Let the OZ jokes flow:

    "Bring me the router of the wicked switch of the Qwest!"

    Although, I am starting to wonder. Has anyone checked to see if this ISP has a record of resisting RIAA subpeonas? Perhaps the RIAA levelled it after acquiring cloudbuster equipment.

    --
    Don't blame Durga. I voted for Centauri.
  14. Compare and contrast... by ptomblin · · Score: 4, Interesting

    A couple of friends of mine were badly burned because the web hosting company they were using lost all their data (customer and their own) in one humungous crash, and didn't have any backups. They didn't even have a spare copy of their customer database, so they couldn't even contact their customers to tell them what was going on. Nor could they tell what customers they had and how much service they'd paid for, etc.

    --
    The next Cmdr Taco duplicate will be ready soon, but subscribers can beat the rush and see it early!
    1. Re:Compare and contrast... by FattMattP · · Score: 2, Insightful
      A couple of friends of mine were badly burned because the web hosting company they were using lost all their data
      It sounds like your friends got badly burned because they didn't back up their data, not because of their ISP. Always back up your data. That goes doubly so if your data is stored on someone else's computer.
      --
      Prevent email address forgery. Publish SPF records for y
  15. 72 Hours to recover from tornado obliteration . . by palutke · · Score: 2, Funny

    . . . how long will it take the article's host to recover from the slashdot effect?

    --
    'I ain't a liar, baby, and I ain't proud I just want what I'm not allowed.' -- Violent Femmes, 36-24-36
  16. Re:Well... by stratjakt · · Score: 3, Informative

    Those businesses should realize they need a backup/disaster plan as well, if they absolutely could not withstand a day of downtime.

    Perhaps having the sites mirrored on two colos in two locations, and routing to the other one when the first goes offline.

    --
    I don't need no instructions to know how to rock!!!!
  17. Before someone else says it... by wo1verin3 · · Score: 5, Funny

    No, in Russia Tornado does not own you. Neither does ISP. It is not, step 1) tornado step 2) ??? step 3) ISP recovers. There is not a beowulf cluster of these, and the tornado doesn't run Linux.

    1. Re:Before someone else says it... by Trigun · · Score: 5, Funny

      the tornado doesn't run Linux.

      No, it runs .NET. There's a lot of huffing and puffing, nobody knows too much about it, and in the end your business is in shambles and half your IT staff is no longer.

      -3 Stupid.

  18. Re:Welcome ! by cK-Gunslinger · · Score: 2, Funny

    I, for one, welcome our new twister overlords.

    +1 Informative?!?

    Does that mean that some moderator actually believes that we have, indeed, been conquered by twisters?

  19. so... by 2MuchC0ffeeMan · · Score: 2, Insightful

    let me get this straight, all the houses around the isp have no power, no phone... but they still need to get online?

    --
    Runnin' On Empty .... I'm Still Alive
    1. Re: so... by LostCluster · · Score: 2, Informative

      This ISP was also a dialtone provider...

    2. Re: so... by snake_dad · · Score: 2, Insightful

      Yes, ofcourse you are right. We all know that ISPs only have customers immediately next to the company building. Damn those CAT5 cable length limitations...

      --
      karma capped .sig seeking available Slashdot poster for long-term relationship.
  20. Cool, but could be better by MicroBerto · · Score: 4, Insightful
    While that's awesome, I still think that small businesses and big ones should both have offsite tape backups. Even if this means the owner brings back and forth a case of tapes to his home once a week or so. That alone would have saved much of this trouble.

    Then I've seen the other end of the spectrum - a 6 Billion dollar corporation's world HQ IT center... wow. They have disaster recovery sessions and planning like I never would have imagined. Very cool facility, but it has to be like that. Some day if they get burned, it's all over.

    --
    Berto
  21. Re:Amazing is an innapropriate adjective by HardCase · · Score: 5, Funny
    I realize that slashdot is mostly populated by high-school educated "IT people", who give a shit about logs and backups and think plugging a PC and monitor into a powerbar is "computer science". To these people, the prospect of plugging in a bunch of computers and restoring backup tapes is exhillirating and exciting. The highlight of their lives.

    But, as a programmer, I just dont care.



    When I was a sophomore, working on my electrical engineering degree, I worked for a small, network-centric company that employed what seemed to be an abnormal number of snooty programmers and technical writers. Maybe it wasn't so abnormal.



    Me: "Hi, IT support."
    Stratjakt: "Hey, I know you're just a high-school educated 'IT person', but you need to get one of your cable monkeys up here and find out why I can't see the network!"
    Me:: "OK, but let's check a couple of things quickly before I dispatch a technician. It may save some time."
    Stratjakt: "Hey, I'm a programmer! I just don't care!"
    Me: "I understand...I realize that my mundane existance doesn't have the exhilaration and exitedness of the thrilling, edge-of-your-seat world of a computer programmer, but there are just a few simple things that we could do to resolve this problem that will be faster than you waiting for a technician."
    Stratjakt: "I just don't care."
    Me: "No problem, I'll dispatch a technican."


    An hour later...


    Technician: "Stratjakt is all fixed up. I plugged his network cable back into the jack."

  22. Truly stunning by dbarclay10 · · Score: 5, Insightful

    What amazes me isn't that these people were able to restore service to their customers in 72 hours. They used standard systems administration techniques. BGP was specifically mentioned.

    No, what amazes me is that this is news. The IT industry is so full of idiots and morons and MCSEs that taking basic precautions earns you a six-figure salary and news coverage. These folks didn't even have off-site backups, it was luck that they were able to resume business operations (ie: billing) so soon.

    Moral of the story? When automobile manufacturers start getting press coverage for doing a great job because unlike their competition, they install brakes in their vehicles, you know that the top-tier IT managers and executives have switched industries.

    --

    Barclay family motto:
    Aut agere aut mori.
    (Either action or death.)
    1. Re:Truly stunning by HardCase · · Score: 5, Interesting
      No, what amazes me is that this is news. The IT industry is so full of idiots and morons and MCSEs that taking basic precautions earns you a six-figure salary and news coverage. These folks didn't even have off-site backups, it was luck that they were able to resume business operations (ie: billing) so soon.


      I agree, although maybe not so vehemently. For the IT managers who need a clue, the article is evidence that a sound disaster recovery plan works. Obviously, in the case of the ISP, the plan wasn't completely sound, but the other, possibly more important, point of the article is that the ISP's management recognized that their recovery plan was incomplete. Based on the lessons they learned, they made changes.


      I work for a large (~20,000 employees) company, with about 10,000 employees at one site. The IT department (actually the entire company as well) has a disaster recovery plan in place. But beyond having a plan, we also have drills. As an example, we are in the flight path of the local airport (possibly not the best place in the world for a manufacturing site). What happens if a plane crashes smack in the middle of the plant? Hopefully we'll never know for sure, but the drills that we've run showed strong and weak points of the disaster plan. The strong points were emphasized, the weak points were revised and the disaster plan continues as a work in progress.


      Specifics aside, and maybe this is just stating the obvious, but considering a disaster recovery plan to be a continuously evolving procedure could be one of its strongest points.


      -h-

  23. 72 hours thats pretty bad by silas_moeckel · · Score: 2, Insightful

    OK I just may be jaded I work in a secor that thinks 5 minutes is earth shattering ammounts of downtime. 72 hours would ahve me everybody that works for me and some C level guys fired at the companies I work for. First things first what did they do wrong backups stored on site this is page 2 of a disaster recovery howto backup need to be stored onsite and remote, they also need to be verified as functional (yes I am that manager that insists that servers be restored and checked for functionality on the backup hardware during a work window) From the story it wasent even client data as much as it was there billing DB and other office information. When will people learn that information makes a lot of businesses and needs to be protected a nominal cost to do proper backups and house them remotly even if it's in a bank vault a few towns over perferably the other coast. Satalite uplinks can provide decent ammounts of bandwith in a pinch though the latency is horid.

    --
    No sir I dont like it.
    1. Re:72 hours thats pretty bad by MachineShedFred · · Score: 2, Informative

      Yup... definetly a manager concerned about the minutes, rather than the details.

      Details like it not being one box or even one rack that went down, but ALL RACKS, ALL WIRES, ALL ELECTRICITY, ALL WALLS, FLOORS, AND CELINGS.

      Also too busy to bother with details like punctuation or a proper paragraph from the look of it...

      --
      Slashdot still doesnâ(TM)t support Unicode after it was added to the HTML standard in 1997.
    2. Re:72 hours thats pretty bad by Xerithane · · Score: 2, Insightful

      I think I speak for everybody when I say, "Uh, what?"

      --
      Dacels Jewelers can't be trusted.
    3. Re:72 hours thats pretty bad by BenV666 · · Score: 3, Funny
      yes I am that manager
      So that's why your post is such a lovely formatted and readable text ;)
  24. New BOFH Excuse... by EvilTwinSkippy · · Score: 2, Funny

    Our ISP was leveled in a Tornado.

    --
    "Learning is not compulsory... neither is survival."
    --Dr.W.Edwards Deming
  25. Re:Amazing is an innapropriate adjective by Mr+Krinkle · · Score: 3, Interesting

    Wrong on SOOOOOO many levels.
    Let me start with this line:
    "I realize that slashdot is mostly populated by high-school educated "IT people", who give a shit about logs and backups"
    You claim to be a programmer, I have been a programmer and am now a Sys Admin, as both the BEST way to troubleshoot was from the logs. Unless you are the supreme programmer whose code never needs debugging and whose users never mispunch something causing an error a log file will let you see and know what has happened.

    Now for this line:
    "and restoring backup tapes is exhillirating and exciting."
    I have restored from tape backup. We had a "programmer" BS from Virginia Tec, Masters from UMass who was certain he knew exactly what he was doing when he blew away an entire production database. (Actually he was a really good guy who just made a simple mistake) Fortunately we had tapes to restore from. But if ANYONE thinks that a restore is "exhillirating" (yes I left your type/mistake in there) then they are just strange. That was one of the most tedious and boring things I have had to do. But we had been tedious in backing EVERYTHING up so production was not severely impacted.

    Now for where you directly insult everyone:
    "I fully expect the PHBs and army of cable monkeys to get the network up and running in our new location."
    So as a systems admin do I become a cable monkey? or am I a PHB? Either way I would be VERY needed if a disaster strikes just as I am needed every day. As for the elitist attitude and your lack of knowledge and concern for the backend of systems I am glad you do not work anywhere near me as I hate IT personal that have to call me to run windows update on their system when the latest worm comes around or to show them how to NOT clik ignore when Norton tells them they have a virus.
    In short, Please show some respect for your coworkers and realize that these guys were prepared and did what their plan stated they could do.
    If not don't be alarmed if somehow your account gets disabled and everything blown away and surprisingly they won't have backups, cause you "just don't care" for them.

    --
    I am 31337 or something.
  26. Re:However... by MachineShedFred · · Score: 5, Funny

    I, for one, welcome our new Tornado-beating ISP overlords.

    --
    Slashdot still doesnâ(TM)t support Unicode after it was added to the HTML standard in 1997.
  27. But... by macshune · · Score: 5, Funny

    Can they recover from the slashdot effect???

    The slashdot effect differs from a tornado in a few subtle ways:

    1) You can't see it coming (unless you pay money to be a subscriber)

    2) It doesn't hurt anything, except for webservers, the occasional OC line lit up like New Year's Eve, spammers, and the odd *IAA executive.

    3) A tornado doesn't typically smell like armpits, cheetos, empty 64oz soda cups, burning plastic, your parent's basement and/or too much cologne for that first date.

    4) It travels at the speed of light, a lot quicker than a tornado.

    5) Does not require specific atmospheric conditions to be present...just a link on the front page.

    Anything else?

    1. Re:But... by BMonger · · Score: 5, Interesting

      Hmmm... what if a website admin did become a subscriber. Could they theoretically take the RSS feed to know when a new post was made, pull the article text, scan it for their domain and if their domain was linked to just have a script auto-block referers from slashdot for like 24 hours or so? Somebody less lazy than me might look into that. Then you could sell it for like $100! It'd be like paying the mob not to beat you up! But only if somebody affiliated with slashdot wrote it I guess.

    2. Re:But... by BMonger · · Score: 4, Funny

      Ohhh! Or even better yet! Have your site auto-post the articles text in a slashdot comment plus block the slashdot referer header for 24 hours! (patent pending)

    3. Re:But... by afidel · · Score: 2, Funny

      Well I think This would be an apropriate signal to use as a slashdot alarm for your datacenter.

      --
      There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
  28. Off topic, I know..... by BlabberMouth · · Score: 2, Interesting

    but isn't the new moderation system leading to the first few good posts on any topic all getting modded up to 5 while the rest get ignored?

  29. Re:Amazing is an innapropriate adjective by Effexor · · Score: 3, Funny

    You're a VB programmer, aren't you?

    --

    As the air to a bird or the sea to a fish, so is contempt to the contemptible -W.B.

  30. Re:Amazing is an innapropriate adjective by venom600 · · Score: 2, Insightful

    Wow! This is exactly the reason that systems administrators generally dislike most members of their development group. Your attitude does not do very much to endeer us 'cable monkeys' and 'PHB's to you.

    "IT people", who give a shit about logs and backups and think plugging a PC and monitor into a powerbar is "computer science"

    If you think this is all that is involved in running a remotely large and reliable network, you are sadly mistaken my friend. A lot of thought, planning and testing goes into most corporate network infrastructures.....kinda like software development.

    "Computer Science" is a very broad term that encompasses much more than just 'programming'.

  31. make sure off-site is far enough away by DiveX · · Score: 3, Insightful

    Many companies in the World Trade Center thought that off-site backup meant the other building.

    --
    Cave, wreck, and deep diver.
    1. Re:make sure off-site is far enough away by hawkbug · · Score: 3, Interesting

      Exactly, that statement is very true - I had a buddy who worked for a company there in tower 2. He worked offsite in Iowa, and one day couldn't vpn in to continue his programming. Turned on the news, and you know the rest. The problem was, he had all his java source on their servers. Sure, they backed it up daily and had an offsite backup in the other tower... The bad news was he lost all his work, and a lot of coworkers. The good news is that the company survived, and simply contracted him on for another 2 years to complete the project. He had to start from scratch, but gets paid more as a result. I'm sure insurance covered the companies losses.

  32. Re:Amazing is an innapropriate adjective by HardCase · · Score: 2, Insightful
    Actually, the hour delay was because of lazy people who kick their network cables out of the wall, then insist that a technician hold their hand to plug it in. It doesn't take an hour to find the problem...in fact, if you listened to the nice help desk man, he would have asked you to look for the end of the plug lying on the carpet. Instead, you wasted 15 minutes of his time explaining to him that you're a programmer who just doesn't care.


    What takes an hour is that the technician has to take care of the other 20 people who can't be bothered to plug a cable back into the wall on their own.


    Oh, and, of course, the tech also has to take care of real work - like fixing the programmer's machine after he installs the latest Webshots and Gator software.


    Me: "It took our technican an hour to get all of the malware off of Stratjakt's computer that he downloaded from the Internet."


    CTO: "Didn't he read the email that I sent out every month for the last six months telling the employees not to install non-work-related software?"


    Me: "Well, I asked him about that...he said that he was a programmer and just doesn't care."


    CTO: "He's fired."


    Oh, and, incidentally, when your self-administering software becomes proficient enough to keep your big foot from wrapping around the network cable and yanking it out of the wall, then I'd say you really had something worthwhile. At this point, though, I have my doubts.

  33. I live in Jackson.... by Daniel+Wood · · Score: 5, Interesting

    I am also a former Aeneas customer.
    Unless Aeneas has made some major changes they are quite certainly the worst ISP I have ever worked with. Aeneas has contracts with the Jackson-Madison County School System to provide internet service district wide. The quality of such service is, bar none, the worst I have experienced.
    I did some volunteer work at a local Elementary school helping teachers work out any lingering computing problems they had(Virii, printer drivers, misconfigured ip settings, file transfer to a new computer, etc). The internet service I experienced while I was there lead me to believe I was on a 128k ISDN line. Not until I went to the server room did I realize that I was, infact, on a T1. Now this is during the middle of summer, mabye four other persons were in the building, three of which were in the same room as myself. The service was also intermittent, having several dead periods while I was working. Needless to say, I remained unimpressed by said experience.

    When I was an Aeneas dialup customer, in 1998, the service provided by Aeneas was also subpar. The dialup speeds were averaging 21.6kbps, where as when I switched to U.S. Internet(now owned by Earthlink) my dialup speeds were always above 26.4kbps(Except on Mother's Day). There were frequent disconnections, and they had a limit of 150hrs/month.

    I'm not supprised how easy it is to restore subpar service. All they had to do was tie together the strings that are their backbone.

    1. Re:I live in Jackson.... by Artifex · · Score: 2, Informative
      When I was an Aeneas dialup customer, in 1998, the service provided by Aeneas was also subpar. The dialup speeds were averaging 21.6kbps, where as when I switched to U.S. Internet(now owned by Earthlink) my dialup speeds were always above 26.4kbps(Except on Mother's Day). There were frequent disconnections, and they had a limit of 150hrs/month.


      Have you never learned what line quality means? Not just from you to your local POP, but beyond the local loop, on the trunks that go across town (or further) to the ISP's POP?

      21.6 is an interim speed that, when seen in conjunction with v.34, v.90, or the other modern standards that go beyond 28K, all technicians know means "well, it's trying, but the lines are crap." Connections that lousy are also prone to disconnects. Nobody deliberately locks their modems down to that speed to be jerks or to save bandwidth - if they were that cheap, they'd put you on old-style 28K modems, which are practically free. The U.S. Internet POP was probably in a different part of town from the Aeneas POP, so you went over different trunks to get there. (since Earthlink took over, they probably dumped their local lines also, and it's probably Sprint's dialup network that is serving you locally)

      Anyone getting weird speeds like that should be bitching to the local telco, not just to the local ISP, though they should have worked with you to isolate the problem to bad trunks. The fact that you're not getting 33 or better connection right now means the local part of the loop is still crappy.

      Oh, I do agree with your complaint about being limited by 150 hours a month. Still, that's not a lousy service issue, that was a contractual agreement you signed up for, right?

      As far as the school's T1 being slow goes, did you attempt any troubleshooting, or just blame it on the ISP? What did the router logs say? Were all channels 1-24 up? Were you getting frequent bounces? What were the CRC errors like? Did you arrange for a circuit test? What were the BERT results? More importantly... did you try throughput while directly connected to the router, since a lot of schools have really pathetic wiring systems because they're installed by volunteers who don't design and install networks for a living?

      - No, I never worked for Aeneas, but I've done everything from dialup to customer network engineering for a global Tier-1 provider, and I have learned from hard, hard experience to be cynical about complaints without supporting evidence.
      --
      Get off my launchpad!
  34. Re:Amazing is an innapropriate adjective by sloth+jr · · Score: 2, Insightful
    IT is about handling the shit storm that happens when the software that YOU write fucks up in the colossal way that it does.

    Keep up the good work.

    sloth jr

  35. What about practicing your disaster recovery? by sllim · · Score: 2, Insightful

    The company I work for practices disaster recovery once a year on all our major systems.

    In the article the writer was talking about how much work it was to migrate the T1 connections, and how they hadn't forseen that. That is exactly the sort of thing that a practice disaster recovery uncovers.

    If you want the model from the place I work it is simple enough:

    1. Run the disaster recovery during a 24 hour period
    2. Pat yourself on the back for what worked.
    3. Ignore what doesn't work.
    4. Repeat next year.

    Of course next year gets a new step:
    3.5 Act surprised that stuff didn't work.

  36. Re:Amazing is an innapropriate adjective by wuice · · Score: 3, Interesting

    Yep, thats the way it works. I dont crawl around on the floor plugging shit in and getting dirty.

    ...

    They're just added beurocracy for the computer world, and I work to replace them each and every day with more sophisticated self-administrating softwares.

    If you don't know how to crawl around on the floor plugging shit in and getting dirty, you do not have the perspective necessary to write software to replace the people who do. The best programmers are not arrogantly disconnected from the people in the trenches, especially if they're working on software directed towards their field. A good programmer needs at least to know what people commonly need support about in order to address it in future software. If your CTO is as out of touch and disconnected as you, I pity your fellow employees.

    You're also a poor team player, which is a liability to you and your career unless you work solo. You're also incredibly stuck up and elitist, which unfortunately probably actually helps your career. You're also way off base: you obviously consider yourself "above" the type of people who enjoyed this article, and your comments have been way more of an advertisment of yourself than anything to do with the issue. Why don't you drop out of this conversation and let the high school kids who spend all day plugging shit in enjoy it. Believe it or not, there are a lot more nerds in high schools than in high-paying programming positions. That being the case, this site should have more stories about them than you.

  37. 72 Hours is a little long.... by fuqqer · · Score: 3, Interesting

    72 hours seems way too long to be out of business. That's 3 days of money that the ISP is not pulling in dough. Unless the whole internet is crippled, I'd ditch an ISP that was out for three days. One of the main selling points for ISP is connectivity rain, snow, shine, OR rabid squirrels...

    The company (ISP/consulting/services hosting) I used to work for had a DR plan to be executed in 24 hours with 75% functionality. Offsite servers and backups of course...

    More impressive to me is the World Trade Center folks like American Express and other companies that had DR plans situated across the river. A lot of datacenters and information services were functional again within 18-24 hours. That's PPP PPP (prior planning prevents piss-poor performance).

    I write good sigs on my bathroom wall...but this is not a real sig.

    1. Re:72 Hours is a little long.... by payslee · · Score: 2, Interesting
      My Dad worked in the IT department at one of those banks, across the street from the WTC. I found it interesting that according to him, the year-2000 bug scare turned out to big a big help when the real disaster struck. Of course, their systems were orders of magnitude more complex than this ISP's, but then they, had that much more redundancy built in to everything.

      Prior to 2000, they built an entirely new system and ran it in parallel with the current one, for six months. Every transaction went through both systems with the results compared to ensure compliance. They had run so many data recovery scenarios that even having to abandon their headquarters did not mean that service was interrupted for more than minute amount of time.

      So the article has a good point when it says you may not know what disaster will hit, but a good plan has flexibility built in. Total system failure can happen in oh so many ways, these days.

      --
      Doing my part to piss off the religious right.
  38. How about by phorm · · Score: 2, Interesting

    1) Implement good disaster-recovery plan
    2) ??? (aka mad-scramble to initiate plan)
    3) Profit (or at least don't go under)


    This must have been a pretty in depth recovery plan though. I mean, even with backups and a redundant connection elsewhere... I think that for myself processing the fact that my office had just been bowled over by wind-on-steroids would faze me for a little while (office...tornado...holy...shit...must...recover.. .data)

    Now they're up and running, but what of their old office? It must be very interesting to have to deal with the stage of "step over rubble, salvage what we can" and the general amazement at nature's fury.

    I'm in the process of configuring several of my servers to offload to a remote master. If the town gets levelled we're toast, but if an individual location bites it, then at least critical data (accounting records, home dirs, etc) is saved. This will still be a big bite out of the business.

    Does insurance cover natural disasters such as tornado, would be a big question? A lot of insurance companies don't cover "act of god", etc

  39. Re:Amazing is an innapropriate adjective by Valence_99 · · Score: 2, Funny

    Oh what a sad day it was when I (being a cable monkey) was asked by the supreme programmer to get his computer back up. When I told him the his HD was dead, he looked at me with shock, as he explained that the last months worth of his so valuable work was on his disk. I asked him if he backed it up anywhere. He said no. He then asked me if we backed it up. I said no, we don't do that for local drives. We sent the drive off to see if anyhing could be recovered. Nope, big waste of time. Almost like his own little tornado in his PC. Hope it doesn't happen to you.

    --
    I'm only human!
  40. Re:An ISP in tornado country by sexylicious · · Score: 2, Informative

    Some places in tornado country can't have basements. This is due to the soil having extra clay, the water table being a couple feet below the surface, or annual flooding.

  41. tape backups? by Musashi+Miyamoto · · Score: 2, Interesting

    From the article, it looks as if the only thing they had to restore from tape/disk was their customer database, so that they could send out the next month's bills. So, the 72 hours was basically putting in new hardware and turning it on. They probably lost all their user's web sites and other "expendible" data.

    How about talking about disaster recovery for a REAL company with tens to hundreds of terabytes of data sitting on disk? The kind of data that you cannot lose and must have back on-line asap?

    This article is like congratulating them for putting up detour signs when a road is destroyed, or rerouting power when a power line goes down.

    Just about everything that was destroyed was not-unique, manufactured items that could be recreated and repurchased. The only exception was the user data, which was pulled off of a nearly destroyed drive by a data recovery company. (Lucky for them!)

    I would like to hear more about companies that lose tons of difficult to replace, unique items, such as TBs of user data, prototype designs, business records, etc.

    I would bet that if a company were to permenantly lose these types of things, they would nearly go out of business.

  42. Been there, done that, Northridge Quake by Tsu+Dho+Nimh · · Score: 5, Interesting
    I was playing minute-person at a "disaster recovery" meeting (the first one) where high-level suits were figuring out what to do in case of a disaster at their multi-state bank. Their core assumptions were initially as follows:
    • They would all survive whatever it was. (I was looking out the window, and seeing jetliners coming in for a landing ... a few feet too low and the meeting would have been over).
    • All critical equipment would survive in repairable condition.
    • Public services would not be affected over a wide area or for a long time.
    • Critical personnel would be available as needed, as would the transportation to get them there.
    • The disaster plan only needed to be distributed to managers, who would instruct people what to do to recover.

    That was on a Monday. The next Monday was the Northridge quake.

    • One critical person woke up with his armoir on top of him, and a 40-foot chasm between him and the freeway.
    • One of their buildings was so badly damaged that they were banned from entering ... and there was mission-critical info on those desktop PCs. Had it not been a holiday, the casualty toll would have been horrendous.
    • The building with their backups was on the same power grid as the one with no power and the generators could only power the computers, not the AC they also needed.
    • None of the buildings had food or water for the staff who had to sleep over, nor did they have working toilets or even cots to nap on.
    • One of the local competitors was back in business Tuesday morning, because their disaster plan worked. They rolled up the trailers, swapped some cables and were going again.

    They came into the next meeting a couple of weeks after the quake with a whole new perspective on disaster planning and training:

    • Anyone who survives knows what the disaster plan is and copies of it are all over the place.
    • Critical equipment is redundant and "offsite" backups are out of the quake zone.
    • They have generators and fuel enough to last a couple of weeks for the critical equipment and it's support, survival supplies for the critical staff. This is rotated regularly to keep it form going stale.
    • They cross-trained like mad.
    • They started testing the plan regularly.
    1. Re:Been there, done that, Northridge Quake by Zachary+Kessin · · Score: 3, Interesting

      Well a solid disaster plan would (if you are big enough to afford it) have a second location far away. If you had a location in California and a second say in Boston you would be ok. Ofcourse that costs a lot of money and many small to mid sized firms could not afford it in the first place.

      But one thing with disaster recovery is you need to figure out what is and is not a disaster you should worry about. I live in Jerusalem, terorism is something very real here but mostly hits soft targets. On the other hand major blizards are a non issue. In Boston we worried about Nor'easters and occationaly a huracane. If you live in Utica NY you probalby don't have to worry to much about terrorism. Fire can happen anywhere.

      I don't know how you figure out what is or is not a probable event in your location. I suppose you talk to the insurance folks they have spent a lot of time figuring this out.

      The other question is how much recovory can you afford? If your disaster recovory plans puts your company into chapter 11 it was not a very good plan.

      I like saying "Utica"

      --
      Erlang Developer and podcaster
  43. Re:Amazing is an innapropriate adjective by Sternyz · · Score: 2, Funny


    5 minutes later -

    HR: "Hi, Stratjakt? This is Mindy in Human Resources, We've outsourced the programming department to a company in Bangalore. Your replacement, Raj, will be calling you today to discuss transferring over all your existing projects. Thanks for all your hard work-"

  44. Re:My ISP's disaster recovery plan by trybywrench · · Score: 2, Interesting

    My friend works for what was UUNET in Richardson TX. His datacenter is on two seperate power feeds and has two or three massive generators with 30 days of fuel. When I asked him why 30 days he said that if the datacenter doesn't have power for >30 days then society is crumbling and Internet access/web sites are pretty low on the overall priority list.

    --
    I came to the datacenter drunk with a fake ID, don't you want to be just like me?
  45. Redneck tornado jokes... by johnwyles · · Score: 2, Funny

    Kind of offtopic but maybe funny if you haven't heard them 495,954 times...

    You might be a redneck if:

    You've been on TV more than 5 times describing the sound of a tornado

    A tornado hits your neighborhood and does a $100,000 worth of improvement.

    --
    [[ the only 15 letter word that is spelled without repeating a letter is uncopyrightable: it may soon be, however. ]]
  46. I wonder by austad · · Score: 2

    I wonder how long it would have taken them if they already had a redundant datacenter that everything was replicated to. In the financial industry, 72 hours passes and the feds come in and shut you down. 72 hours may be acceptable for an ISP, but not for a bank or services like Western Union.

    --
    Need Free Juniper/NetScreen Support? JuniperForum
  47. Hmmm... by EverDense · · Score: 2, Funny

    You should have posted a link to the ISP's website.
    Then we could've kicked a dog while it was down.

    --
    http://jesus.everdense.com/
  48. Not good enough by vasqzr · · Score: 3, Insightful


    When you go to a DRP seminar, they make the claim that the majority of business that are knocked out for longer than 48 hours go out of business within 1 year.

  49. From the article... by n7ytd · · Score: 3, Funny
    Miraculously, the vendor discovered a recent copy of the customer records database on all four computers and was able to recover all of the customer data and return it to Aeneas, delaying printing of its May bills only minimally.

    This was from a mazazine for managers, after all. Now there's some good news that pointy-haired bosses can understand!

  50. Nice name for data recovery company... by jtheory · · Score: 2, Funny

    Did anyone else read "Kroll OnTrack" as "Troll OnKrack"?

    Wait, did anyone else even read the article?
    Oh, never mind.

    --
    There are only 10 types of people: those who understand decimal, those who don't, and, uh, 8 other types I forget.
  51. Young! by holzp · · Score: 2, Interesting

    did anybody else notice these lines:

    Meanwhile, Aeneas CIO and Operations Manager Josh Hart..

    'It doesn't even look like there was an office here,'" remembers Hart, 25.

    Aeneas launched its contingency plan when it was founded in 1996; since then, CIO Hart has enhanced the strategy gradually almost every year.

    Seems to have gone unnoticed that this guy founded the company at 18...before the dot com boom!

  52. Re:My ISP's disaster recovery plan by afidel · · Score: 3, Interesting

    30 days may be a bit much but as I found out one day 48 hours comes close to being too little in some situations. We had a massive generator capable of running most of our 4 story suburban office building for a couple days including the datacenter, AC for the datacenter, lights, and desktops. It would not run AC for the rest of the building or the elevator. At the ~35% load we placed on it and its 500 gallon tank the engineer from Catapilar said it should run for around 48 hours. Well we called our fuel supplier to get some offroad diesel delivered the next morning, no can do, they no longer stock it!?!? WHAT! Then we tried every other listed company in the area, none of them could get to us the next day with fuel. We ended up getting a fuel company out to deliver 300 gallons from Detroit to our offices in Akron, Ohio paying a $500 delivery charge and 70 cents a mile. After that we made sure to get a contract with a fuel company that guarenteed 24 hour delivery of offroad diesel =)

    --
    There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
  53. Re:Amazing is an innapropriate adjective by Artifex · · Score: 2, Insightful
    When I told him the his HD was dead, he looked at me with shock, as he explained that the last months worth of his so valuable work was on his disk. I asked him if he backed it up anywhere. He said no. He then asked me if we backed it up. I said no, we don't do that for local drives.


    This is really sad, and the company could have fired him for being incompetent. He basically destroyed their intellectual property through negligence, wasting all the money they invested in his project, which was almost certainly more than just his salary for that time period.

    If a truck driver gets a load and forgets to check his own tie-downs, and as a result loses the load before reaching his destination, whose fault is it?

    Besides, as supreme programmer, he should be motivated to work sometimes from home in the middle of the night, and have backups there :)
    --
    Get off my launchpad!