Disaster Recovery?
M. Grochmal asks: "A three-alarm fire at Southern Maine Technical College burned through the Computer Technology and Technical Graphics departments. We have salvaged most of what we can, but cannot return into the building until the asbestos risk decreases. The hard part now is rebuilding the networks in another building. The schedules have been rearranged, many of the department students and faculty are volunteering to relocate salvageable computers, as well as install/configure the new computers that will be arriving in the next day or so. On top of that, we have to rebuild the Netware servers, restore from backups, and get them networked again. I was wondering how other Slashdot readers were able to recuperate from unforeseen damage to their work (and learning) environments. You can read about the fire here and see what the schedule is. Wish us luck."
Just make sure you get your NDS stuff restored within three days. AFter that, have fun rebuilding your tree! Unless of course, you followed guidelines and had offsite replicas of each partition!
Uhh, you would have a DR plan BEFORE the place burns down based on your DR plan that was developed with the business or school needs. Tape backups and another site to restore to. Perhaps even the information was mirrored to it via a SAN. Everyone keeps their tapes offsite right?
That's all I've been doing since the 9/11 incident and I think it has something to do with that since I work in the Boston WTC everyone is a bit paranoid about dataloss since we had offices that got toasted in NYC.
I do wish you luck. I'm an admin in the Engineering Computer Center at the University of Houston. Fortunately the College of Engineering was one of the 10 buildings that escaped flood damage during Tropical Storm Allison last suummer.
Even though out Telecomm Department was able to pull enough equipment out of the Telecomm Engineeing lab to get the network sort of back up, we were without full connectivity for almost a month. It took about 4 days to get the electricity back on to our undamaged building and we didn't have phone for about 2 weeks. There are a few buildings on campus that are still unusable.
Best of luck. It sounds like your situation is going to be more tedious than difficult, though.
utter rubbish
Nothing is better than a recent, working and complete backup ... but a few days ago, i saw an advertisement from a firm (DriveSavers) they are specialised in data-recovery for destroyed harddisks, maybe they can help!
...
Besides, i suppose it would be best to see the positive side of that incident, i'm sure it will be a good experience rebuilding the network! Anyway, good luck to you
Life sucks.
I was visiting some friends at your campus just this past December; sorry to hear about your loss.
Sadly, I can't give you any suggestions on how to better recover from your current situation -- seems like what can be done now is being done. It seems there's not been much of a response to this as yet, so I'll go out on a limb and offer some ideas that may sound obvious, but forest and trees and all that.
I'm reading between the lines, but I suspect that prior thoughts of backups and disaster recovery were shot down by the PHBs as being too expensive or time consuming. Here's your chance!
You now have a rare opportunity where proposals for FUTURE disaster recovery would actually be listened to!
First off, document what you are doing now! Write it down in a notebook, carry around a pocket tape recorder, use a PDA, hire some students who will answer a phone so that when something comes to mind, you can just dial a phone and get it recorded; whatever, but document what it is actually costing to recover! And not just the hardware/software expenses either! Increased calls to the help desk. Impact on faculty and students' schedules. Reconstructing the network topology.
Anything you can think of, now, document it! If, upon later review, some things are questionable, you can omit it then. But, if during that later review the thought was: "Gee this took more than we had thought it would, too bad we didn't keep track..." Get the picture?
So, now you'll have some kind of baseline as to what the actual recovery costs were, in this case. With that, you can now make a strong business case to implement a solid disaster recovery plan. Include server configs, backups, inventory of hardware and software... in short you've got a list of what you actually had to do to recover from this disaster; use that to identify what you'd need to do again.
Other ideas off the top of my head: Get a fire supression system. Split some of the equipment (e.g. labs) across multiple buildings so that if one burns down, there's some infrastructure that is still usable. You'll have a working system that you can refer to while rebuilding the destroyed system, too.
Disaster Recovery Resources - it contains a lot of useful articles about disaster recovery.
I wish you luck!
First off, servers belong in a nice server room, not in a closet near the lab. It may be ok for your home network, but for a network at a college or company, this is a must. Also, if you can, have the server room in one building, and labs in others. This way your lab may go up in smoke and your servers will be fine, or your server may get damaged, and your clients are fine. When doing a server room, make sure it has elevated floors (about 1 foot above rest of the floors floor), conveyance trays, redundant air conditioning, FM200 fire supression, TSM or some other backup solution, possible offsite mirroring of servers, NO WINDOWS (the glass kind, not the OS kind), UPS's and if possible, make it a hardened, 1 floor building with the chillers located inside (storms can't rip chiller off ground if they are inside), generator backup and some bathrooms, food storage, and maybe even a shwoer facility if admins must pull an all nighter. This may sound silly for a school, but that depends on how important your data is. We used to have servers serving the labs all over campus, but now they are all centrally located in the data center. Management is easier, but then we have more to loose if our data center is hit. That's why we have a halon fire supression (until new center is built, and it will use FM200) and a disaster recovery plan including a hotsite. Have all of the servers centrally located also assists in running backups either via a networked TSM type solution (Tivoli software, IBM hardware) or individual tapes (not reccomended, but better then nothing).
Gorkman
Now might be a good time to take a good look at what you wanted to get rid of in your old network.
Since everything is destroyed for the most part, use this as an opertunity to get rid of those pesky NT 3.51, Novell Netware, and Vax machines that have been cluttering up the computer room.
Ditch that legacy shit and start anew with the insurance check. (Presuming the machines were insured.)
Conformity is the jailer of freedom and enemy of growth. -JFK
Is this a student run network or something?
Looking through the schedule, I see you've got random students crimping cable, "Before I (you notice I have dropped the We...), will allow you to crimp a connector, I will expect you to have read the pages at the above web site that describe "How-to". "
Nothing wrong with doing your own connectors if you have the proper test equipment to check it with. But having students who have never crimped before, doing so? Seems like a good way to learn how to break Ethernet, especially if you don't test.
As well, students installing the servers after taking a course, "Seniors, who have taken the Network System Management and Network Engineering courses should take advantage of this opportunity to build NetWare servers."
I assume you will be wiping the drives and installing them with your own secure setup after?
Your situation is the kind of thing I'd volunteer a weekend to help with, if you were local. I just have to wonder about what your network is going to look like after it is setup.
Contrary to much popular belief, a good data recovery contingency (off-site back-ups, etc...) is only half of a sound DRP. When it comes to recovering from a cataclysmic disaster of this nature - the second, and equally critical component of a well thought out DRP is an all-inclusive BCP (Business Continuation Plan)...
Without this vital aspect, companies such as Deutsche Bank (who were ravaged by the WTC disaster on 9/11), would have been down for days/weeks while attempting to relocate, rebuild and restore their data center operations...
I, for instance, work at a rather large, international fortune 500 company and we have BCP strategies that include a complete off-site location. This facility houses fail-over systems for all business critical processes including a 1.2 terabyte, mirrored SAP database that can go online within minutes notice, and a phone bank/workstations for our 50+ CSR's (customer service reps) and our global helpdesk. Even more, we frequently (twice yearly) perform non-production drills to validate the systems health and improve upon our strategies...
This is obviously a bit late for you, but I would suggest reading up on the matter a bit more thoroughly prior to redesigning your future systems and developing your next DRP...
Beer is proof that God loves us and wants us to be happy. -- Benjamin Franklin
I agree. The IT department I work for has an exellent Disaster Recovery program, if the Data center I am currently in were to burn to the ground we can have everything operational as if nothing happened within 36 hours.
:-)
as for recovering without a plan. If there was anything that you always wished you could go back to the begining and change, now's the time
Pseudocode is code to demonstrate a concept, not designed to be run. Like certain M$ software.
Most of the system administrators would love to be able to consolidate systems down to a few supported platforms, break services up so we're not supporting 'mainframe-esque' systems running 20+ applications. Unfortunately, the system admins are not what drives the university. Research dollars are. And to get the research dollars, you have to have faculty, and you have to keep them happy, which means installing undocumented software at some faculty member's whim, or keeping a Wang still running so that they can do their word processing. [It does, however, keep pizza warm, so it's not all bad].
Yes, some systems can be consolidated down, upgrade, or otherwise be made more space efficient, but you need to maintain the same OS and similar hardware, or you're looking at significantly increasing your workload due to instability, installation headaches, etc.
Now, there may be some systems that just can't be recovered, but it's not the system admin's job to decide that-- it's management's. The system admin can give advice, but they don't run the university, and if management decides that it's in the best interest to restore 17 year old mainframes that suck down $300k/yr in maintaince contracts and cooling costs, and occupy 1/2 of the space in the machine room, it's their decision. You can either do what they tell you to, or find a new job.
Now, if your management repeatedly doesn't listen to you, and continues to do what you warn them against, you'd probably be happier finding a new job. One of the nice benefits of university jobs is that you can carry over TIAA-CREF to most schools.
Personally, I would recommend first recovering every system possible, and once that's done, and you have everything back and operational again, work on migrating out machines that are harder to maintain and recover. Do not worry about getting rid of systems unless they just can't be recovered. Don't worry about anything else until after the systems are recovered.
PS. Don't put machine rooms in basements. Sewer pipes breaking above the machine room is bad.
Build it, and they will come^Hplain.