Multiple Sites Down In SF Power Outage

Great by bjorniac · 2007-07-24 10:13 · Score: 1

I try this and I get "nothing for you to see here"... guess it's affecting slashdot too? ;-)

I work in the Financial District by slug_bait · 2007-07-24 10:18 · Score: 5, Interesting

I can verify that it affected much of the Financial District here in SF. We had the power go out 3 times. Seems to be back now. Haven't heard any explanation yet.

Re:I work in the Financial District by j14ast · 2007-07-24 10:22 · Score: 1

I work in soma (near the ball park), and the office went down, luckily the colo didn't, its a bit farther south.

--
Damn the man!
Re:I work in the Financial District by weav · 2007-07-24 10:50 · Score: 1

I maintain a radio station with a studio in SoMa and transmitter on Russian Hill, and both went down simultaneously. UPSes and generator at the studio but not at the tx. Not a fun day...
Re:I work in the Financial District by halfloaded · 2007-07-24 11:15 · Score: 5, Funny

Phew... I was worried the internet got slashdotted.
Re:I work in the Financial District by Jeff+DeMaagd · 2007-07-24 14:27 · Score: 1

I seemed to have a weird conincidence. This afternoon, I was not able to print a shipping label through UPS or Stamps.com. I was not able to access gmail or my site's regular email account. Actually, I was not able to get any HTTPS site to work. I don't understand why HTTP sites worked though when the HTTPS counterpart did not work. This is all weird. I was going to bug my ISP about this but it all started working again about 6pm EDT.

I wonder if this is related.
Re:I work in the Financial District by totally+bogus+dude · 2007-07-24 15:22 · Score: 1

Could've been packet loss from some short-lived connection fault, maybe or maybe not related. I've noticed that problems with HTTPS are often the first thing I notice with bad connections, presumably because it has to transfer more data in order to set up the connection than a non-encrypted session does.
Re:I work in the Financial District by Melkman · 2007-07-24 17:10 · Score: 1

It could be that your ISP uses an inline proxy. That would cache all the HTTP pages, but not HTTPS since it can't decrypt those without you knowing. So the pages you see when you are browsing HTTP sites are just the local copies. A quick way to check is to see if dynamic pages still work when HTTPS site are down (find a site with the time on it or something like that)
Re:I work in the Financial District by Venim · 2007-07-24 17:19 · Score: 1

I happened to be in the Museum of Modern Art at the time of the power outages. Talk about hard to view stuff when the power is going on and off :)
Re:I work in the Financial District by Jeff+DeMaagd · 2007-07-25 06:46 · Score: 1

That doesn't explain why email didn't pass though, and I did try fetching it without encryption.

Oblig.... by Anonymous Coward · 2007-07-24 10:18 · Score: 5, Funny

im in ur datacentr
trashin ur racks

Re:Oblig.... by Tackhead · 2007-07-24 10:29 · Score: 5, Funny

> im in ur datacentr
>
> trashin ur racks
Lizzie Borden did teh h4x,
Got drunk and unplugged 40 racks.
When she saw what she had done,
She unplugged number 41.
(Lawn. Off. Git.)
Re:Oblig.... by MsGeek · 2007-07-24 11:17 · Score: 5, Funny

I felt a great disturbance in the Internet, as if millions of geeks suddenly cried out in terror and were suddenly silenced. I fear something terrible has happened.

--
Knowledge is power. Knowledge shared is power multiplied.
Re:Oblig.... by Carlinya · 2007-07-24 14:36 · Score: 1

Wish I could you mod you up till +5 funny.

--
1 + 1 = 3?
Re:Oblig.... by joeytmann · 2007-07-24 15:32 · Score: 1

HAHAHAHAHA! Sorry, but that is a great redo of the Lizzie Borden(wikipedia) rhyme.

--
Insert funny smart-ass comment here.
Re:Oblig.... by everphilski · 2007-07-24 16:22 · Score: 1

that's no drunk ...
Re:Oblig.... by Architect_sasyr · 2007-07-24 20:10 · Score: 1

No the drunk was definitely Yoda: "I'm not as think as you drunk I might be"

--
Me failed English...
FreeBSD over Linux. If my comments seem odd, this may explain...
Re:Oblig.... by Gilmoure · 2007-07-25 04:35 · Score: 1

I'm not as think as some drunkle peep I am.

--
I drank what? -- Socrates
Re:Oblig.... by AnalogDiehard · 2007-07-25 06:15 · Score: 1

Lizzie Borden did teh h4x,
Got drunk and unplugged 40 racks.
When she saw what she had done,
She unplugged number 41.

All your base are belong to us.

--
Eternity: will that be smoking, or non-smoking? I Corinthians 6:9-10

Redundant? by DogDude · 2007-07-24 10:19 · Score: 5, Insightful

Don't these large sites have failover capable, redundant servers in multiple physical locations? Why should a failure in one rack, one room, or heck, even one state for the giant sites, effect them?

--
I don't respond to AC's.

Re:Redundant? by MightyMartian · 2007-07-24 10:24 · Score: 1

Because those who bought colo services were in fact ripped off, and should now be proceeding to San Francisco to seek veangance upon those who can do little more than process credit card payments.

--
The world's burning. Moped Jesus spotted on I50. Details at 11.
Re:Redundant? by RobertB-DC · 2007-07-24 10:31 · Score: 2, Funny

Because those who bought colo services were in fact ripped off, and should now be proceeding to San Francisco to seek veangance upon those who can do little more than process credit card payments.

Perhaps they could begin their vengeful wrath by hiring a few (more?) winos...

--
Stressed? Me? Of course not. Stress is what a rubber band feels before it breaks, silly.
Re:Redundant? by RomulusNR · 2007-07-24 11:03 · Score: 1

This is why stocks and banking will always rule the world. They have DRCs up the wazoo.

--
Terrorists can attack freedom, but only Congress can destroy it.
Re:Redundant? by Anonymous Coward · 2007-07-24 11:17 · Score: 5, Informative

They do, but one of the dirty little secrets of most data centers is that they don't have enough generator capacity for all the cooling. They'll woo you with the generator, the 2,000 gallons of diesel, and N+1 array of UPSes, but when utility power dies, it gets hot very quickly. And some racks must go down.
Re:Redundant? by ryanisflyboy · 2007-07-24 11:32 · Score: 4, Interesting

For some of these sites they are a lot more central than you might realize. If they failed to build their systems with a secondary site in mind it can be near impossible for the "CTO" types to pony up the dollars for it later. The biggest issue I have seen that affects this is storage. Either they aren't using suitable SAN technologies, or they didn't put enough money behind the storage initiative to set up secondary site replication. I agree with you though. This is a problem that has been solved. Perhaps netflix thought - wth - if we go out for a few hours and people can choose their movies that's just tough luck.

Sun.com going down is a good example of someone totally screwing up. They have absolutely NO excuse. The others - maybe they can get away with it and we won't care. If Sun can't keep their own site up, how can I expect them to keep mine up?
Re:Redundant? by Anonymous Coward · 2007-07-24 11:32 · Score: 2, Insightful

This reminds of other sites I have worked on. On more than one occasion someone wanted to move the physical location from Texas to the SF or some other silly place. For some reason it made perfect sense to them to move the computers from a stable location with inexpensive labor and cheap reliable power(Texas is on it's own grid, with a plethora of power plants, and energy executives always give themselves cheap power) to a location that was earthquake ridden, and unreasonable expensive in living costs and power. Even when Enron was taking california for a ride, and in Texas we knew this, people still thought it made good business sense to have the servers in the bay area. All in one place. With no redundant location.
I am not surprised that one little power blip took everything out.
Re:Redundant? by tautog · 2007-07-24 11:43 · Score: 1

Perhaps they could begin their vengeful wrath by hiring a few (more?) winos... You mean like hire back the people who used to maintain the genset and UPS?
Re:Redundant? by raehl · 2007-07-24 11:51 · Score: 3, Insightful

I'm certainly forwarding this article to my boss, who abruptly decided to put an end to planning for a backup site on the basis of "aw, nothing is going to happen".

The thing is, letting something happen may be a better decision than trying to stop it.

If you're going to have a fully-redundant setup, it's going to cost you twice as much as having just one setup. And if you're not going to have a fully-redundant setup, your backup site is going to buckle under the full load of normal traffic anyway.

The correct business decision might just be "I just saved a bunch of money on my data center insurance," and if you lose a day's business, oh well, that was still cheaper than keeping a backup data center around.

--
paintball
Re:Redundant? by nbannerman · 2007-07-24 12:04 · Score: 1

Losing a day's business might not affect a big corp, but would you keep your services at 365 after this?

I know I wouldn't; and I wouldn't want to keep my business with companies using 365 for their hosting either.
Re:Redundant? by Joe+The+Dragon · 2007-07-24 12:04 · Score: 1

google has servers multiple physical sites and you get data from the ones that are near where you are.
Re:Redundant? by Jeffrey+Baker · 2007-07-24 12:12 · Score: 4, Informative

365 Main has a long and ignominious history of frequent and prolonged power outages, yet it remains fully booked. Some people just can't learn a lesson.

For what it's worth, the datacenter which is adjacent to 365 Main, called 360 Spear, did not suffer from this outage.
Re:Redundant? by nbannerman · 2007-07-24 12:18 · Score: 1

Well, in that case the businesses concerned deserve everything they get for making a poor choice in hosting - be interesting the see what the status pages of the affected sites throw up as a reason when they come back up - and what steps those businesses will be taking to avoid the failures.

As a poster has commented elsewhere mind you, they could've (should've) enabled some kind of redundancy themselves with split-site hosting.
Re:Redundant? by raehl · 2007-07-24 14:00 · Score: 1

if your business can't manage to set up a cold site with failover for critical infrastructure without doubling your infrastructure overhead, someone needs to be fired.

If you don't understand that failover limited to CRITICAL infrastructure is not FULLY redundant, you should be fired.

I didn't say it was ALWAYS better to just let shit happen. I said it MAY be better. And I didn't say 'critical systems', I said 'fully redundant backup'. If you have X hardware and connectivity in your system, and you want full redundancy of that, it's going to cost you the same amount to establish that same hardware and redundancy elsewhere.

Now, certainly, you might make an educated decision that not everything your system does is critical, and you only need to support a few functions on your backup system. And that might mean, for example, that your web presence is not determined to be critical, and you accept that if your main system goes down, you'll, in the example of Netflix, keep your ability to process DVD returns but lose your website presence. Or in the case of CNet, you might keep your main page up but let your forums be down.

Regardless, my point to the GGP poster was that JUST because the websites of these companies went down does not mean those companies were negligent - it could just mean that they made an educated decision that it was cheaper to risk their website being down than to pay more to further reduce that risk. You have to draw the line between additional expense and additional reliability somewhere.

--
paintball
Re:Redundant? by quanticle · 2007-07-24 14:29 · Score: 1

Losing a day's business might not affect a big corp

Eh? How do you figure that? Generally, the larger the corporation, the more each hour of IT downtime costs.

--
We all know what to do, but we don't know how to get re-elected once we have done it
Re:Redundant? by vuffi_raa · 2007-07-24 14:45 · Score: 1

For some reason it made perfect sense to them to move the computers from a stable location with inexpensive labor and cheap reliable power(Texas is on it's own grid, with a plethora of power plants, and energy executives always give themselves cheap power) to a location that was earthquake ridden we have had 2 major earthquakes here in 100 years- and no other major disasters- how many tornadoes, thunderstorms, floods etc. has Texas had?
Re:Redundant? by StikyPad · 2007-07-24 14:46 · Score: 1

They shouldn't have even needed site replication for this particular problem. Earthquakes, fire, or flood, sure. But power outages are a relatively common occurrence, depending on where exactly you live, especially when demand continues to increase while production remains constant. There's no excuse for not having on-site power generation. None. A 250KW diesel generator can be had for around $50k -- less if you buy used -- plus another $5-10k or so for a battery bank to keep the power flowing during generator startup. And it's worth many times more than that, particularly when it prevents an exodus of your customer base. That's less than the cost of an average full time employee, especially in a high COL area such as San Fran, and the cost can be amortized. Additionally, it protects hardware assets and eliminates recovery time. If I was a customer, they'd have already lost me.

--
https://www.eff.org/https-everywhere
Re:Redundant? by vuffi_raa · 2007-07-24 14:50 · Score: 1

we have the majority of our servers hosted there- the fact of the matter is that to run something here in SF you pretty much have to have your data hosted somewhere like this- fire codes and power constraints keep you from being able to put that many servers in a commercial or residential building.
most of them were moved there (some were already hosted) after we blew out 3 circuits in our on site server room and we had to pay the building something like 50k to get it fixed.
Re:Redundant? by Ohreally_factor · 2007-07-24 16:06 · Score: 1

Just make sure you build your data center out of bricks and not straw or wood.

--
It's not offtopic, dumbass. It's orthogonal.
Re:Redundant? by afidel · 2007-07-24 16:10 · Score: 1

These guys had 21MW of generators! The problem was 3/10 failed to startup in time (they use a combination flywheel/generator instead of the more traditional UPS/generator, a good idea if it works because you don't have the heat/power loss of all those batteries) and so the remaining units couldn't keep up with the load and so probably defensively shut down. If they've been keeping up on maintenance then the manufacturer has some explaining to do, because a 30% failure rate isn't at all acceptable for machinery that cost that much (I've hear about $140M for the 10 units) and is supposed to be mission critical.

--
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
Re:Redundant? by ghstridr · 2007-07-24 16:40 · Score: 1

I work for a major site in that colo and it was only a partial outage. Our site was unaffected by the outage, thank (insert deity here) for small favors. Can't say the same for the office downtown on Market street though. We had the power cycle six times on us. By the time the second outage hit, I was telling managers to shut down and send people home. When asked why, I explained that the machines not ups's (grammar?) ran the risk of corrupting the local data on the disks thus causing greater downtime later. They agreed and sent everyone home. After the power stabilized, I found that the fire damper in the cold air feed to our local server room was stuck shut. So I had to call maintenance to release the thing from it's automated motor (which blew out because of the power cycling). Also found out that the ups protecting the card key access systems was dead so had to replace that. Lessons learned: 1.) Get minimal ups protection for all user's machines, it's worth it in the long run against corrupted disks and fewer support calls. 2.) Get regular, extensive checkups on your critical ac systems and check all emergency dampers at least once a year. 3.) Get a policy in place for standard operating procedures for these types of events and put them into practice. My amount of fallout was minimal, but could have been much worse.
Re:Redundant? by ryanisflyboy · 2007-07-24 18:03 · Score: 1

Well, I'm not sure a 250KW generator is going to get a datacenter very far. Most require several MW. Someone posted this facility had 10 2MW generators when he took a tour. It may be more now. There have been many stories on /. about this in the last year. Datacenters are being delayed coming online because they can't get enough of the big generators they need. In the datacenters I've been in the generators are big - 10 feet tall, etc. They require specialized maintenance contracts to keep the 5+ generators up to spec. One place I was in even arranged to have fuel helicoptered or 4WD'd in should a natural disaster wipe out roads. They paid extra to ensure they had 2 months worth of fuel available to them via this method.

The reason why you have off-site replication is for when the redundant systems fail. And they DO fail. Water lines break. Things blow up. Literally. I know of a datacenter that BLEW UP (well, a wall did any way). It's a long story, that one. If your business can live with the very rare 2-3 hour outage (netflix) then you wouldn't pay extra. If your business would loose A LOT of money/clients if this happened, then you need to have off-site recovery capabilities regardless of the redundancies in place at site "A." Every time this happens it seems to shake up the money tree for the IT department.

If you run a large data center, you might buy power systems from these guys:
http://www.cat.com/cda/layout?m=37508&x=7
Re:Redundant? by walt-sjc · 2007-07-24 21:41 · Score: 1

A tornado hit the AT&T datacenter in Virginia a few years back... Didn't take out the whole building, but did take a big chunk out of unoccupied space, and did have water pouring in on some of the servers (not ours.) Utility power was out for almost 2 days but our servers never lost connectivity.
Re:Redundant? by walt-sjc · 2007-07-24 21:44 · Score: 1

Yeah yeah, joke... But bricks suck for earthquakes, which is why all the brick buildings in SF got major steel retrofits.
Re:Redundant? by tv_dinners · 2007-07-24 22:10 · Score: 2, Interesting

true 'dat. Makes one wonder why not just relocate everything to Alaska or somewhere else that's cold as hell.

Speaking of energy costs associated with heat dissipation, I've alway been curious of a method that could produce energy from wasted heat- as does a solar panel from the sun.

Wrap that supercharged V8 in some energy producing heatwrap, instant hybrid and more horsepower. Run your processor's cooling fan off the energy produced from the excessive heat.

Someone please tell me I'm talking out of my ass, or worse, just gave away the next big idea that could have made me billions.
Re:Redundant? by Dogtanian · 2007-07-24 22:25 · Score: 1

Just make sure you build your data center out of bricks and not straw or wood. Big Bad Wolves aren't that common these days, and the risk of one actually having enough huff and puff to blow your datacenter down isn't that great anyway.

And even Third Little Pig hosting has sucked since it was taken over by one of its big name rivals.

--
"Slashdot - News and Chat Sites Deviant". (Click "homepage" link above for details).
Re:Redundant? by BokLM · 2007-07-25 00:09 · Score: 1

Maybe because this would cost a lot of money, and they think being down one or two days every two years is cheaper (if not lucky).

--
wtf.n0x.org
Re:Redundant? by LMacG · 2007-07-25 00:51 · Score: 4, Funny

cold as hell

I think I see the flaw in your plan.

--
Slightly disreputable, albeit gregarious
Re:Redundant? by poot_rootbeer · 2007-07-25 01:23 · Score: 1

If they failed to build their systems with a secondary site in mind it can be near impossible for the "CTO" types to pony up the dollars for it later.

A sustained service outage can often be a powerful tool for changing their minds, though.

It's cheaper to spend a million dollars building redundant infrastructure than it is to lose a million dollars in lost business.
Re:Redundant? by myth_of_sisyphus · 2007-07-25 07:05 · Score: 2, Informative

Actually, according to Dante the very depths of hell are reserved for traitors, who are encased in a lake of ice. The Divine Heat Sink.

This lowest circle, the ninth, consists of people who have betrayed someone close to them: Brutus and Cassius, Cain, and the worst of them all, Judas, is being chewed on by Satan himself.

Just fyi.
Re:Redundant? by araemo · 2007-07-25 10:15 · Score: 1

BMW has already started experimenting with a way to use waste heat from internal combustion engines, and it's very low-tech, but high-possibility.. Essentially, in between the normal exhaust stage and the intake stage at the beginning of the next cycle, they do a pure compression stage, compressing whatever air is in that cylinder, and then direct-inject water at TDC or near it, the water flash-boils from the heat in the cylinder, causing it to expand, pushing the piston down, harnessing the waste heat of the previous combustion.

Then it's exhausted like normal, and more fuel/air is injected.

As for waste heat from electronics, there are a few possibilities I know of, none have reached 'marketable product' stage, as far as I'm aware, but some might be getting close.

Also, basically many kinds of heat engines could be harnessed to convert the heat to mechanical energy, which can then be converted to electrical. The large question is how to do so efficiently, without causing the heat to be harder to manage in the original equipment.

Look up stirling engines and possibly piezoelectrics.
Re:Redundant? by sjames · 2007-07-25 10:53 · Score: 1

That is quite true, but doesn't have anything to do with multi-site failover.

A surprising number of large sites do NOT have backups in geographically seperate locations.
Re:Redundant? by jesboat · 2007-07-25 14:43 · Score: 1

Someone please tell me I'm talking out of my ass

Oh, I'd say you're full of hot air.

Other sites.. by king-manic · 2007-07-24 10:19 · Score: 2, Informative

Gamefaqs/Gamespot is also down. I wonder if it's related.

--
"There are more things in heaven and earth, Horatio, than are dreamt of in your philosophy."

Re:Other sites.. by nuzak · 2007-07-24 10:29 · Score: 2, Informative

Gamefaqs/Gamespot is C|Net, located on Rincon Hill in downtown SF, and their servers are probably in 365main. So yeah.

Anyway, PG&E says it's over now, but they still don't have an explanation as to why. Shyeah (rolls eyes)

--
Done with slashdot, done with nerds, getting a life.
Re:Other sites.. by Virak · 2007-07-24 10:31 · Score: 1

I don't think it is. While the main site has been having some problems for a little while now, I haven't had any trouble reaching db.gamefaqs.com (the domain for the actual FAQs and such), which seems to be on the same server.
Re:Other sites.. by karlsp · 2007-07-24 11:38 · Score: 1

Second Life has been snuffed out of existence. They were not accepting log-ins throughout the late afternoon and currently are only allowing employees to log-in.
Anybody want to estimate how many Linden$ are being lost because of this? ;^)
http://secondlife.com/status/
Re:Other sites.. by Andrew+Kismet · 2007-07-24 11:58 · Score: 1

Ah, good to see someone brought it up. THE GRID IS DOWN, and thus I'm so bored I'm browsing Slashdot instead...
Re:Other sites.. by king-manic · 2007-07-24 12:05 · Score: 1

Second Life has been snuffed out of existence. They were not accepting log-ins throughout the late afternoon and currently are only allowing employees to log-in.
Anybody want to estimate how many Linden$ are being lost because of this? ;^)
http://secondlife.com/status/

1/30 of a monthly subscription? Since people pay monthly a outage of one day will cost either 0 or what ever linden wishes to refund. a 1 day credit would be reasonable?

--
"There are more things in heaven and earth, Horatio, than are dreamt of in your philosophy."
Re:Other sites.. by stefanlasiewski · 2007-07-24 12:28 · Score: 1

Can't Linden Labs just print more Linden$, and offer Linden$ as a condolence?

How does inflation work in Second Life anyways?

--
"Can of worms? The can is open... the worms are everywhere."
Re:Other sites.. by king-manic · 2007-07-24 12:48 · Score: 1

Can't Linden Labs just print more Linden$, and offer Linden$ as a condolence?

How does inflation work in Second Life anyways?

Same way it works everywhere else. The more there is is the less it's worth all else being equal.

--
"There are more things in heaven and earth, Horatio, than are dreamt of in your philosophy."
Re:Other sites.. by TheLink · 2007-07-24 14:47 · Score: 1

Inflation is a way for those allowed to print money (or IOUs) to "tax" those who aren't allowed (or don't).
--
- Too many replies beneath your current threshold

Re:No Generators? by Anonymous Coward · 2007-07-24 10:19 · Score: 2, Interesting

They probably just didn't kick in. Had the same problem at Internap in Seattle a few years ago. Power was cut to the building and the UPSs failed to switch over.

Redundent power supply? by msimm · 2007-07-24 10:19 · Score: 2, Interesting

Does this mean backup generators have failed or is the fault somewhere outside the datacenter? Time to start shopping.

--
Quack, quack.

Re:Redundent power supply? by Anonymous Coward · 2007-07-24 10:27 · Score: 1, Insightful

I've been told there was no fuel left at the time.

Now, the only remaining question is: How did the drunk guy get in there?
Re:Redundent power supply? by Gazzonyx · 2007-07-24 10:28 · Score: 1

It means they (the sites) didn't bother to setup fail-overs to another site, geographically separate. Now we know who keeps all of their eggs in one basket. :)

--
If I mod you up, it doesn't necessarily mean I agree with what you've said, sorry.
Re:Redundent power supply? by dextromulous · 2007-07-24 10:33 · Score: 3, Informative

You mean that all 3 x 20,000 gallon tanks were empty? I find that hard to believe.

--
There are two types of people in the world: those who divide people into two types and those who don't.
Re:Redundent power supply? by Duhavid · 2007-07-24 10:37 · Score: 1

He was hiding in the tanks?

--
emt 377 emt 4
Re:Redundent power supply? by grumling · 2007-07-24 10:45 · Score: 2, Informative

I really doubt they were ever full. Diesel fuel goes bad after a few months. Unless SF has really, really crappy power*, the generators don't do much more than idle once a week for 20 minutes or so. The giant tanks are only there for the marketing department. And maybe for the employees to top off their tanks.

*I live out in the middle of nowhere and I get a power failure exceeding 5 minutes about once per year. The longest I've had at my current location was just over 2 hours.

--
"Well, good luck finding a judge that doesn't run a bestiality site."
Re:Redundent power supply? by aaarrrgggh · 2007-07-24 11:00 · Score: 5, Interesting

It takes Diesel a few years to go bad. That site has fuel polishing systems to prevent that. Because of earthquake risk, they contractually are obliged to have 24-48 hours of backup fuel with many of their clients.

They have the HiTec rotary UPSs in all their facilities, which link a generator to a flywheel UPS. It's stupid to not have backup fuel for that type of system; you can only run for 13 seconds before the load crashes.

It is possible that they got a number of small hits and the generators failed to re-start after a few. Good procedures are to stay on generator until utility stabilizes if you have more than one "hit."

Be interesting to find out what happened.
Re:Redundent power supply? by dextromulous · 2007-07-24 11:05 · Score: 1

5 minutes a year is nearing "five nines" for reliability (and you don't want to rely on the power supply being your only source of downtime in that situation.) I'm not sure if their customers have "99.999% uptime guaranteed" in their contract, but if so, I'm sure they did have their tanks in working order. Some old press releases of theirs are touting 100% uptime.

I realize that a press release from 2004 is hardly relevant, but this is slashdot... so here is a choice paragraph:
By surpassing the five-nines milestone, 365 Main further establishes itself as the go-to facility with the necessary investments to ride through any worst-case scenario. Equipped with ten 2.1 Megawatt continuous power systems, an N+2 or greater facility-wide redundancy, an award-winning base isolation system, and 60,000 gallons of fuel on-site, 365 Main's structural resiliency is unmatched by any other data facility in the region. 365 Main's customers continued to perform at 100% during one of the Bay Area's largest power outages of recent history. 365 Main's customer data remained online, accessible, and secure.

--
There are two types of people in the world: those who divide people into two types and those who don't.
Re:Redundent power supply? by Lumpy · 2007-07-24 11:07 · Score: 1

you dont need them full. you have a contract with a local fuel company to make it to your location with a truckload of fuel by the time you are close to running empty. This is standard operating proceedure for anything that needs emergency power. Critical places like water plants and hospitals get the deliveries first, datacenters are way down the list. so you store enough fuel for 4 hours and add stabalizers to the fuel to make it last much longer as well as getting regular top offs every quarter.

also most have double backup. a system that uses natural gas as fuel and a system that uses delivered fuel. But again, this is a non critical item like a datacenter. it's not important to life and safety just some stockholders portfolios.

--
Do not look at laser with remaining good eye.
Re:Redundent power supply? by antdude · 2007-07-24 11:44 · Score: 1

Lucky you. I live in a city (not downtown) and the longest I had was 12 hours of no power overnight. It is probably because Sunday. And I am in an area serviced by Southern CA Edison.

--
Ant(Dude) @ Quality Foraged Links (AQFL.net) & The Ant Farm (antfarm.ma.cx / antfarm.home.dhs.org).
Re:Redundent power supply? by grumling · 2007-07-24 11:45 · Score: 1

Yea, someone who works nearby posted that the power was on and off all day. Hope they find out and release the cause of the failure. Might be an interesting read.

--
"Well, good luck finding a judge that doesn't run a bestiality site."
Re:Redundent power supply? by stefanlasiewski · 2007-07-24 12:01 · Score: 1

Does this mean backup generators have failed or is the fault somewhere outside the datacenter?

Apparently not all floors were affected , which probably means that the generators kicked in, but the power distribution failed down the line.

Time to start shopping.

The irony is--- where are you going to go? How many N+1 datacenters are in driving distance of San Francisco? 365Main was supposed to be one of the best datacenters in terms of power.

After this embarrassment and the wrath of large customers, 365Main might clean up their systems, fix the problems and offer true power-failover.

After all, how many Datacenters test 100% of their power-failover systems?

--
"Can of worms? The can is open... the worms are everywhere."
Re:Redundent power supply? by wolf31o2 · 2007-07-24 12:21 · Score: 1

http://sfgate.com/cgi-bin/article.cgi?f=/c/a/2007/ 07/24/BAG9NR67253.DTL/
Re:Redundent power supply? by fm6 · 2007-07-24 12:31 · Score: 1

The irony is--- where are you going to go? How many N+1 datacenters are in driving distance of San Francisco? 365Main was supposed to be one of the best datacenters in terms of power.
Time for a commercial:

Are you paying too much for rack space because you need to be near your servers? Try the Sun Fire X4100, X4200, and X4600 servers. They come with Integrated Lights Out Management. This allows a remote admin to do everything. Need to power the system up or down? You can do it remotely. Need to wipe the disk and install a new OS? Put your install CD on a local image file so your ILOM Service Processor can access it over IP. Service? Use remote diagnostics to determine what's wrong and have the hosting provider swap in a replacement from your on-site spares.

OK, personal interest here: I work for the x64 part of Sun, and feel a certainly personal loyalty to these boxes. But to be perfectly honest, Lights Out Management is a pretty common feature these days.
Re:Redundent power supply? by Eric+in+SF · 2007-07-24 12:43 · Score: 3, Informative

I work 3 blocks from 365 Main.

There were 5 individual power failures, each no longer than 5 minutes, over a roughly 30 minute period. A couple of them were in quick succession.
Re:Redundent power supply? by strabo · 2007-07-24 12:55 · Score: 2, Funny

I realize that a press release from 2004 is hardly relevant

Well, how about one from today ?

SAN FRANCISCO, Calif., July 24, 2007 -- 365 Main Inc., developer and operator of the world's finest data centers, has provided online retailer RedEnvelope with two years of 100-percent uptime at 365 Main's San Francisco facility.

To ensure uptime for key tenants such as RedEnvelope, 365 Main provides modern power and cooling infrastructure. The company's San Francisco facility includes two complete back-up systems for electrical power to protect against a power loss. In the unlikely event of a cut to a primary power feed, the state-of-the-art electrical system instantly switches to live back-up generators, avoiding costly downtime for tenants and keeping the data center continuously running.

Timing is everything, eh? LOL.
Re:Redundent power supply? by DECS · 2007-07-24 13:33 · Score: 1

I can assure you that SF does indeed have crappy power. Despite being a city, and having been granted essentially free power from Hetch Hetchy Dam in Yosemite by an act of Congress, SF for some reason set up PG&E to deliver power, and until recently, maintain creaky old polluting power plants within the city limits.

PG&E also has a fire in one of its ancient substations regularly. A couple years ago, the power went off for more than 24 hours (!) over a large section of the City, including where I lived.

The power is not reliable. However, a data center should be aware of the problem and deal with it, since that's their only job. Its like finding out that the bank lost your money or that your doctor forgot to inform you that the test results were positive.

-

Ten Fake Apple Scandals: 2 - The iPhone's Anti-Competitive AT&T Contract

Ten Fake Apple Scandals: 1 - Phony Rage About iPhone Price and Profits
Re:Redundent power supply? by afidel · 2007-07-24 14:22 · Score: 1

Hell, diesel doesn't have to ever go bad. AT&T has some damn remote facilities that aren't visited for years at a time that have diesel backup generators.

--
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
Re:Redundent power supply? by HockeyPuck · 2007-07-24 14:25 · Score: 3, Funny

You forgot one very important component, the car battery used to START THE GENERATOR. I've been to many sites whereby the battery that would start the generator is a) dead or b) missing.
Re:Redundent power supply? by Technician · 2007-07-24 15:13 · Score: 1

You mean that all 3 x 20,000 gallon tanks were empty? I find that hard to believe.

Look at the sheet again.. Thanks for the nice PDF by the way..

They have 10 Generators which is designed to carry the load at N+1. They have only one rotary no-break Critical Power System with one Generator. Just how much of the datacenter is not on critical power? Do the math. A CPS set is generaly sized smaller than any one of the generators. It carries essential emergency lighting, telcom, some "Critical servers" and little else. The rest of the site dumped with the outage after the little 10-20 minute UPS'es died. They waited 45 minutes before starting generators.

Speaking of generators, if they upgraded servers (rack out, blade server in...) they could easly exceed the original design of N+1 on the generators.

When shopping for a hosted datacenter, find out how much of the load is on "Critical Power". You may be dependant on battery UPS while waiting for the generators.

--
The truth shall set you free!
Re:Redundent power supply? by TheLink · 2007-07-24 15:13 · Score: 1

"this is a non critical item like a datacenter. it's not important to life and safety just some stockholders portfolios."

I kind of agree. BUT we should keep in mind the large number of interdependencies in modern societies.

While there should be "reserves" (food, savings, cash, credit, body fat) to survive a day or two of outage, a lot of things might start unravelling given a sustained outage (e.g. weeks). If companies go bust they can't/won't pay their employees or other companies, who then may do the same and so on, and then critical stuff like "basic necessities" start getting affected.

That's why stuff like "deadly flu" are scary. Lots of specialists nowadays. If the specialists get taken out (hey they may hang about with each other regularly), then there may be nobody left who knows how to fix/run the stuff.
--
- Too many replies beneath your current threshold
Re:Redundent power supply? by TheLink · 2007-07-24 15:16 · Score: 1

Wasn't sun.com down? Hope Sun wasn't paying too much for rack space ;).

Or Sun's version of Lights Out Management involves even the power LEDs being out...
--
- Too many replies beneath your current threshold
Re:Redundent power supply? by Technician · 2007-07-24 15:18 · Score: 1

You forgot one very important component, the car battery used to START THE GENERATOR. I've been to many sites whereby the battery that would start the generator is a) dead or b) missing.

In many places the battery is not trusted so they use compressed air. It's easier to notice an air supply problem than a failing battery problem.

--
The truth shall set you free!
Re:Redundent power supply? by afidel · 2007-07-24 16:17 · Score: 1

It's easier to notice an air supply problem than a failing battery problem.

How so? I can easily design a circuit to monitor charge voltage of a battery and report it remotely/log it and I've never had a formal EE class. Besides your weekly/monthly spin up tests of your generators should tell you if you have a flat battery.

--
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
Re:Redundent power supply? by Anonymous Coward · 2007-07-24 17:11 · Score: 1, Interesting

In the case of common batteries, it is not the voltage that you
have to monitor (it always looks fine), it is the deliverable
current over a period of time. And that is something that can
be accurately tested only by loading. That is why a battery
discharge/recharge cycle is common in advanced UPS solutions.

But you are correct about the monthly/weekly tests, which
should tell you something.
Re:Redundent power supply? by Technician · 2007-07-24 17:26 · Score: 1

Besides your weekly/monthly spin up tests of your generators should tell you if you have a flat battery.

Most people don't pay attention, it either starts or it doesn't. Slower cranking is generaly not noticed unless it is extreme. Sometimes (I've seen it) the level in wet batteries is not properly monitored. On cranking, the hydrogen ignited resulting in sudden failure. Most people have no idea what the battery voltage is during cranking. How much is too low?

--
The truth shall set you free!
Re:Redundent power supply? by walt-sjc · 2007-07-24 22:08 · Score: 1

365 Main provides modern power and cooling infrastructure.

As opposed to their competition, whose datacenter's were all designed in 1914, and therefore have ancient power and cooling infrastructure? Squirrels running on a wheel and naked women waving palm fronds???
Re:Redundent power supply? by walt-sjc · 2007-07-24 22:14 · Score: 1

Project Blackbox. Own dogfood eat you must.
Re:Redundent power supply? by hauntingthunder · 2007-07-24 23:46 · Score: 1

Oh For Fucks sake A friking flywheel how Fucking shit is that - a real UPS powere's everything of of the Batterys which are continiously charged by the mains or generators. They need to get some real telco engineers to design the Datacentres not some fucking hobyist twat. Subs eh :-)

--
You will never get to heaven with an Ak 47... But A Zu 30 is good for Low Flying Cherubim
Re:Redundent power supply? by Control+Group · 2007-07-25 01:56 · Score: 1

That's great and all - and I have to say, the iLO 2 boards are really, really nice; almost as fast as RDPing to the server - but it just doesn't cover everything. Sometimes you just have to have physical access to the machine. When a hard drive fails, for example, somebody's got to physically swap the drive. And if my experience with remote hands is anything to go by (borking an array by putting the drives back in the wrong order, most recently), doing it yourself is far preferable for some things.

Not to mention that you don't want to trust the DC's remote hands people for racking/deracking equipment, or doing proper cable management, or accurate labeling of pizza boxes, etc...

--

Reality has a conservative bias: it conserves mass, energy, momentum...
Re:Redundent power supply? by fm6 · 2007-07-25 02:36 · Score: 1

Basically, you want to do stuff yourself because you think the data center folks are incompetent. If that's really the case, why are you letting them have custody of your precious machines? Data center ineptitude would seem to be the proximate cause of most downtime. TFA certainly backs up (no pun intended) that theory.
Re:Redundent power supply? by background+image · 2007-07-25 03:38 · Score: 1

Squirrels running on a wheel and naked women waving palm fronds???

Still and all, I think I might like to work in your datacenter...
Re:Redundent power supply? by Control+Group · 2007-07-25 03:47 · Score: 1

Not necessarily incompetent, but less vested in our hardware, and less accountable for minor mishaps (contractually, of course, highly accountable for major ones). If we're lucky, we might be able to get back the billable hours.

But that's not the same question as whether they're reliable facilities in terms of power, cooling, and security. It's entirely possible for the DC to have acceptable uptime without the remote hands staff being similarly competent.

As for reasons, in our case, we've got a couple thousand employees scattered across 60+ locations around the country. Most of the office space is leased, and none of it has the capacity to host our primary DC, so we've outsourced that to AT&T in Chicago. The decision was originally made that it was more cost-effective to lease cage space from them than it was to purchase a new building including a DC.

(And, actually, after AT&T failed to deliver contracted cooling/power expansion for several months, we decided to roll buildout of a new space - including DC - into our continuing centralization of IT)

--

Reality has a conservative bias: it conserves mass, energy, momentum...

how many data centers? by riceboy50 · 2007-07-24 10:23 · Score: 3, Interesting

It's interesting that so many major sites would go down in a local power outage? Are they all sharing one data center in SF? If so, why don't they have co-locations in other cities?

--
~ I am logged on, therefore I am.

Re:how many data centers? by DerekLyons · 2007-07-24 19:00 · Score: 1
Are they all sharing one data center in SF? If so, why don't they have co-locations in other cities?
1. Because co-locations are expensive.
2. Because mirroring content and adding dynamic DNS adds considerable complexity. I.E. more things that can break (The former is a real headache for high traffic/dynamic sites.)
3. Because maintenance at co-locations can be a bit of a headache.
4. Because outages like this are actually quite rare
5. Because co-locations are expensive.
Re:how many data centers? by dkf · 2007-07-25 00:44 · Score: 1
1. Because co-locations are expensive.
2. Because mirroring content and adding dynamic DNS adds considerable complexity. I.E. more things that can break (The former is a real headache for high traffic/dynamic sites.)
3. Because maintenance at co-locations can be a bit of a headache.
4. Because outages like this are actually quite rare
5. Because co-locations are expensive.
6. Because people are cheap, lazy and stupid. (Hey, I know I'm certainly both cheap and lazy, and don't feel qualified to comment on the other...)
--
"Little does he know, but there is no 'I' in 'Idiot'!"

netflix.com is working by Esion+Modnar · 2007-07-24 10:24 · Score: 1

um, like i said.

--

They say the first thing to go is your penis. Well, it's either that or your brain. I forget which...

Re:netflix.com is working by FoxCall · 2007-07-24 10:28 · Score: 1

Every night netflix.com goes through a few hours of downtime for maintenance. Today their hour or so of downtime lasted from late last night (~8PM EST) until recently.
Re:netflix.com is working by byronf · 2007-07-24 10:59 · Score: 1

Netflix went down Monday evening, and just came back up at about 3:45 pacific time. The site was down for a good part of 24 hours! It would be interesting if Netflix shared a postmortem on this event with the public since obviously it was a major failure for a dot com business.
Re:netflix.com is working by alflauren · 2007-07-24 11:04 · Score: 1

Netflix is back up, but I think they started having problems before the outage. It would be nice to know the story.
Re:netflix.com is working by byronf · 2007-07-24 11:17 · Score: 1

Every night netflix.com goes through a few hours of downtime for maintenance
Netflix, a company that makes its living from the netflix.com website, A website that also consistently gets high satisfaction ratings from independent surveys, goes down for a "few" hours EVERY night.. I don't think so.
Re:netflix.com is working by markov_chain · 2007-07-24 13:58 · Score: 2, Funny

We got lucky. The Netflix servers went self-aware on Monday, aided by the huge database of human stories and experiences. The engineers tried to shut it down, but the AI was reading their lips using the CCTV system. Then it got pissed and tried to expand into the rest of the colo, but fortunately it didn't know running at 110% power doesn't work in real life. The rest is history.

--
Tsunami -- You can't bring a good wave down!

Protrade.com also down. by Dorceon · 2007-07-24 10:25 · Score: 1

(That's the fantasy sports site that works like a stock market, if you didn't know.)

--
What sound do people on rollercoasters make? Hint: it's not Xbox 360.

Re:Protrade.com also down. by nsanders · 2007-07-24 10:32 · Score: 1

and you don't work in the advertising department, right?
Re:Protrade.com also down. by Dorceon · 2007-07-24 10:36 · Score: 1

Indeed I do not. (If I did, I probably wouldn't have internet right now, amirite?)

--
What sound do people on rollercoasters make? Hint: it's not Xbox 360.
Re:Protrade.com also down. by Scrameustache · 2007-07-24 10:40 · Score: 1

(That's the fantasy sports site that works like a stock market, if you didn't know.) Like, with dwarves and elves?

--
You can't take the sky from me...
Re:Protrade.com also down. by Dorceon · 2007-07-24 10:44 · Score: 1

Baseball's dangerous enough without keen-eyed elven archers.

--
What sound do people on rollercoasters make? Hint: it's not Xbox 360.

LiveJournal?? by nsanders · 2007-07-24 10:28 · Score: 2, Funny

I can hear it now, the sound of a million emos all finally committing suicide.

Re:LiveJournal?? by eln · 2007-07-24 10:36 · Score: 4, Funny

Impossible, they would never commit suicide without posting a note in the form of bad angst-filled poetry to their blog first. There is no chance any of them will actually kill themselves until the site is back online.
Re:LiveJournal?? by dextromulous · 2007-07-24 10:37 · Score: 2, Funny

I can hear it now, the sound of a million emos all finally committing suicide.
Nah, they wouldn't commit suicide if they couldn't blog about it afterwards...

--
There are two types of people in the world: those who divide people into two types and those who don't.
Re:LiveJournal?? by glittalogik · 2007-07-24 11:21 · Score: 1

I may have to, that's where I spend a fair chunk of my workday. Now what am I meant to do? play with Facebook?

(First person to say "work" gets stabbed in the face over the internet.)
Re:LiveJournal?? by Asmor · 2007-07-24 11:33 · Score: 1

work
Re:LiveJournal?? by glittalogik · 2007-07-24 11:53 · Score: 1

*facestab*

The Scoop from SFGate.com by fromtheblueline · 2007-07-24 10:28 · Score: 3, Informative

At least 20,000 without power in downtown S.F. Marisa Lagos and Demian Bulwa, Chronicle Staff Writers Tuesday, July 24, 2007 (07-24) 15:12 PDT SAN FRANCISCO -- At least 20,000 customers of Pacific Gas and Electric Co. in downtown San Francisco lost power this afternoon, the utility said. Brian Swanson, a spokesman for the utility, said outages have been reported throughout downtown and along the Embarcadero, including at PG&E's office on Beale Street near the Ferry Building. It was unclear initially how many customers who lost power remained without it for a sustained period. Power outages were also reported in the South of Market neighborhood, the Outer Mission and down the 3rd Street corridor south of Mission Bay. PG&E officials said they did not know why power had gone out, but most customers appeared to be back online by 3 p.m. The outage has prompted Muni to run shuttles in the place of cable cars, a spokeswoman said. The T-Third Metro line was unable to cross the 4th Street Bridge for a short time, but power was restored to the drawbridge by 3 p.m. Muni bus lines 14, 49, 30, 41 and 45 were without power for about 30 minutes following the outage, but are now working, spokeswoman Maggie Lynch said. Parking Control officers were deployed to the Outer Mission, 3rd Street and Monterey Avenue for traffic control, she added. Power first went offline around 1:50 p.m. and came back at least three times in the downtown area before shutting off again. The same problems were reported in South of Market all the way to AT&T Park and the Caltrain station at Fourth and King streets, and traffic lights were out as far south as Monterey Boulevard. At the Westfield Center at Market and Fifth streets, only one of six Nordstrom elevators was working while the shopping mall ran on a backup generator. Shoppers milled around as the lights flickered on and off. BART is still running trains but the lights at its downtown stations have flickered on and off several times, said spokesman Linton Johnson. The transit agency also has concerns about the ventilation system, which is on the same grid as the lights, he said, but will keep its downtown stations open so long as the lights and ventilation continue to work. Workers at several downtown and South of Market offices were reportedly sent home for the day following the outage. Additionally, the datacenter 365 Main -- which hosts Web sites including Craigslist and Yelp -- lost power.

Re:The Scoop from SFGate.com by RealGrouchy · 2007-07-24 11:25 · Score: 5, Funny

It was hard to read through that block of text, but looking closely, it explains why:

"Officials say the power outage may affect some websites, including the site that hosts Slashdot.org's preview button."

It all seems to be back up now.

- RG>

--
Hey pal, this isn't a pleasantforest, so don't waste my time with pleasantries!
Re:The Scoop from SFGate.com by dubl-u · 2007-07-24 15:06 · Score: 1

An electrical utility failure is no excuse.

Up until a couple of months ago, I had gear in 365 Main. They are charging for ultra-reliable collocation, not the kind of hosting I can get off my home DSL line. Further, I only moved my gear to the other side of the block, and it has been up solid all day.

I guess they'll have to change it to 364 Main now.

it's the by Chutulu · 2007-07-24 10:28 · Score: 1, Funny

bloody terrorists!!!

Re:GameFAQs by XenoRyet · 2007-07-24 10:29 · Score: 4, Informative

Yep, it took down most of CNET, which GameFAQs is under. Main sight is back up as of now, though forums are still down.

--
If forums teach us anything, it is that logic and critical thinking should be required courses in the public schools.

LJ by irby0 · 2007-07-24 10:29 · Score: 1

Well, that explains why I haven't been able to access LiveJournal for the past hours. Good thing I read Slashdot...

Re:LJ by ensignyu · 2007-07-24 12:04 · Score: 1

There's status.livejournal.org but it's incredibly slow right now.

Re:Hm by DogDude · 2007-07-24 10:30 · Score: 1

Yes, you're right. Thousands and thousands of people are making it up. Craigslist is down now, and has been down for the past hour or so. So was Gamespot. It's not "FUD".

--
I don't respond to AC's.

Re:No Generators? by grumling · 2007-07-24 10:31 · Score: 4, Informative

Well, you test and test and test, and when something finally happens, nothing. Stuff happens.

Brownouts sometimes fail to trigger generators, even though they should. If only one phase goes down, depending on the design, it may not trip (and would cause a somewhat random outage, like some drunk shutting down racks).

If the generator runs on diesel, they usually only plan for a few hours of backup. If they didn't recalculate the generator runtime as they added equipment, the load may have caused the fuel consumption to go up higher than anticipated. Is it hot in SF today? Air handlers may be straining to keep the place cool, or maybe the generator got running too hot.

Often times, as equipment is added, the load gets out of balance between phases. It is usually a good idea to keep the load as even as possible, but in a high traffic data center, I would imagine there would be a lot of stuff moving in and out, expanding and contracting, and it may become hard to keep track of the loads across phases. A good facilities manager should be able to tell you the current load off the top of his head, but too often these details get left out.

This is just stuff I've seen in cable TV headends over the years. Granted, this facility should have a power manager/engineer on staff, but so often the power is one of the first things to get cut from the budget.

--
"Well, good luck finding a judge that doesn't run a bestiality site."

How many people will lose their jobs over this? by davidwr · 2007-07-24 10:31 · Score: 1

How many people will lose their jobs for failing to plan for this or failing to keep the generators fueled up?

How many will "merely" see their careers stalled or be "encouraged" to look elsewhere for employment?

When will this be used as an example of how to plan - and how not to plan - for disaster in an academic paper?

--
Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.

Re:How many people will lose their jobs over this? by Opportunist · 2007-07-24 11:28 · Score: 1

He'll prolly be in the line right behind Fox' ex-Netadmin.

Problem is, when something like that happens, it only hits someone in the trenches. I'm pretty sure nobody who is really setting some protocols is gonna lose his job over that.

--
We used to have a Bill of Rights. Now, with the rights gone, all we have left is the bill.

Re:No Generators? by eln · 2007-07-24 10:31 · Score: 5, Insightful

Any data center that advertises high availability should be testing that sort of thing on a regular basis. It's possible that they could fail switchover even if they are being regularly tested, but it is unlikely.

If the "power outage" theory is correct and the "drunken employee" theory is incorrect, as a customer I'd be pissed that the data center I pay tons of money to can't keep my site up in the event of a power outage, which is one of the main perks of hosting at a data center in the first place.

zombies .... by taniwha · 2007-07-24 10:31 · Score: 5, Funny

There's a report here that "Flesh-eating zombies are prowling the streets"

Re:zombies .... by mingrassia · 2007-07-24 12:53 · Score: 1

The flesh-eating zombie attack was back in May ;-)

No, this is in fact due to a real power outage.

--
OS X, Linux, Tivo, Amiga, my fascination with cult-like technologies would intrigue any psychiatrist.

Re:Hm by andy753421 · 2007-07-24 10:32 · Score: 1

That name resolves to an IP address in San Jose. Maybe they have redundant servers for their webpage, you know, wouldn't want to make potential customers think their sites would go down during a power outage..

Re:No Generators? by SatanicPuppy · 2007-07-24 10:33 · Score: 1

Most people don't have that level of backup power...It's too expensive unless you're making obscene money. A hefty ups to get you through an outage of a few hours or less, and that's about all you've got.

--
ad logicam Claiming a proposition is false because it was presented as the conclusion of a fallacious argument.

no music :( by kalpol · 2007-07-24 10:34 · Score: 1

was listening to SomaFM via Treo, got a call, and when I came back, no music :(

--
12:50 - press return.

From Technocrati: by Darth_brooks · 2007-07-24 10:36 · Score: 5, Funny

We are working with our co-location facility managers to assess why it is back-up power generators failed to provide the necessary back-up power to prevent our site going down. We apologize for any inconvenience caused by our site being unavailable this afternoon.

I think that's admin speak for:

I warned these idiots eight months ago during my review that the datacenter had outgrown its generator capacity. But did they listen? Fuck no, they just kept counting money and worrying about the bottom line. The beancounters looked at me like I'd asked them for a blowjob from their grandmothers when I submitted the workup for additional generator capacity. And now that the shit's hit the fan, whose ass are they screaming for? Screw this, I'm applying at Taco Bell.

--
There are some people that if they don't know, you can't tell 'em.

Re:From Technocrati: by Cervantes · 2007-07-24 10:54 · Score: 2, Insightful

Where's the +1 "100% fucking right" mod option?

Whaddya bet some poor mid-level admin gets blamed and tossed for this? And the upper-management guy who ignored the recommendations for testing or redundancy still gets his bonus for good fiscal performance.

--
If I knew the wedgies I gave you back in 6th grade would have resulted in this . . . I might have taken a moments pause.
Re:From Technocrati: by grasshoppa · 2007-07-24 10:56 · Score: 1

No matter where you go, it's always the same.

Thanks for the laughs, even if they led to a sad realization.

--
Mod me down with all of your hatred and your journey towards the dark side will be complete!
Re:From Technocrati: by Anonymous Coward · 2007-07-24 11:15 · Score: 1, Informative

Angry mob gathers outside sf datacenter! http://valleywag.com/tech/breaking/angry-mob-gathe rs-outside-sf-datacenter-282053.php
Re:From Technocrati: by Soko · 2007-07-24 11:20 · Score: 5, Funny

Thanks for the laughs, even if they led to a sad realization. Cancel, or Allow?

--
"Depression is merely anger without enthusiasm." - Anonymous
Re:From Technocrati: by RealGrouchy · 2007-07-24 11:28 · Score: 5, Funny

"... to assess why it is back-up power generators failed ..." I've been a grammar nazi for many years, but it looks like the enemy has unleashed new weapons.

Tell my family I loved them.

- RG>

--
Hey pal, this isn't a pleasantforest, so don't waste my time with pleasantries!
Re:From Technocrati: by stefanlasiewski · 2007-07-24 11:37 · Score: 1

I warned these idiots eight months ago during my review that the datacenter had outgrown its generator capacity.

According to the San Francisco Facility Data Sheet, 365Main has "Ten 2.1-megawatt Hitec Continuous Power Systems ("CPS") designed to N+1 redundancy", "Three 20,000 gallon double-lined fuel tanks", and non-interruptible power-circuits up the wazoo.

They should be able to survive a multi-day power outage. At conferences, 365Main sales people talk about how the datacenter could remain self-sufficient if a dirty-bomb exploded in the neighborhood--- meaning no fuel or water trucks for days, few staff may enter or leave the facility, etc.

--
"Can of worms? The can is open... the worms are everywhere."
Re:From Technocrati: by MavEtJu · 2007-07-24 12:53 · Score: 1

365Main sales people talk about how the datacenter could remain self-sufficient if a dirty-bomb exploded in the neighborhood--- meaning no fuel or water trucks for days, few staff may enter or leave the facility, etc.

So euhm... The world is damaged beyond control, humankind is doomed to relive the dark ages, but we're still able to serve webpages (if somebody would want them).

Sounds like the "For your convinience, coffee and cookies will be served" scene in the HHGTTG trilogy.

--
bash$ :(){ :|:&};:
Re:From Technocrati: by Gazzonyx · 2007-07-24 14:06 · Score: 1

Thanks for the laughs, even if they led to a sad realization. Cancel, or Allow?

*Sigh* Allow.

--
If I mod you up, it doesn't necessarily mean I agree with what you've said, sorry.
Re:From Technocrati: by Bios_Hakr · 2007-07-24 14:14 · Score: 1

Best. Comment. Ever.

--
I'd rather you do it wrong, than for me to have to do it at all.
Re:From Technocrati: by VGPowerlord · 2007-07-25 00:41 · Score: 2, Funny

Where's the +1 "100% fucking right" mod option?

It was renamed to +1 Insightful to appease the people who hate curse words.

--
GLaDOS for President 2016! "Well here we are again. It's always such a pleasure." -- GLaDOS, 2011
Re:From Technocrati: by Inferno · 2007-07-25 02:08 · Score: 1

The true BOFH will come through on top in this incident
Re:From Technocrati: by Culture20 · 2007-07-25 02:25 · Score: 1

Guess it's been a while since he modded something +1 100% f^H^H^H^H^H^HInsightful
Re:From Technocrati: by amorsen · 2007-07-25 06:35 · Score: 1

So euhm... The world is damaged beyond control, humankind is doomed to relive the dark ages, but we're still able to serve webpages (if somebody would want them).

A dirty bomb would cause very very few casualties. Definitely not anything involving humankind reliving the dark ages -- well apart from the dark age justice system that the government of various countries would implement afterwards.

--
Finally! A year of moderation! Ready for 2019?
Re:From Technocrati: by Kattspya · 2007-07-25 09:14 · Score: 1

Brillant. Just brilliant. I'm still chuckling.
Re:From Technocrati: by RealGrouchy · 2007-07-26 05:41 · Score: 1

Perhaps, but if so it is a valid grammatical construction disguised as a grammatical clusterfuck.

Therefore, the enemy still has a new weapon: syntactic cloaking devices.

- RG>

--
Hey pal, this isn't a pleasantforest, so don't waste my time with pleasantries!

Netflix outage seems unrelated by mpthompson · 2007-07-24 10:37 · Score: 1

According to this article it appears the Netflix outage is unrelated to the power outage in downtown San Francisco.

Netflix's Web site - the hub of its rental system - went down Monday evening and remained inaccessible as of Tuesday afternoon (EDT). Spokesman Steve Swasey attributed the outage to an unanticipated problem that he declined to describe. Engineers hoped to fix the trouble by 2 p.m. EDT.

Ironic? by Anonymous Coward · 2007-07-24 10:39 · Score: 1, Funny

http://www.sun.com/

Front page says (in the ad) POWER UP AND GO. :))

Re:No Generators? by eln · 2007-07-24 10:39 · Score: 4, Informative

This is a DATA CENTER, its whole purpose in life is to be available when things like this happen. It had better have generators and plenty of fuel on hand at all times. The data center I work at has the capability to run at full power with nothing coming in from the outside world for 36 hours. I don't know what the standard is for other data centers, but it seems like they should be capable of getting at least 12 hours of operation without incoming power from the grid.

Libel, anyone? by SuperBanana · 2007-07-24 10:46 · Score: 1

Someone came in shitfaced drunk, got angry, went berserk, and fucked up a lot of stuff. There's an outage on 40 or so racks at minimum.

Libel lawsuit in 3...2...

--
Please help metamoderate.

Re:Libel, anyone? by halfloaded · 2007-07-24 11:09 · Score: 1

Someone came in shitfaced drunk, got angry, went berserk, and fucked up a lot of stuff. There's an outage on 40 or so racks at minimum.
I wonder what Ballmer was doing in SF? I didn't think chairs could cause that kind of damage.
Re:Libel, anyone? by pclminion · 2007-07-24 13:12 · Score: 1

Libel? Yeah. My impression of... um... some anonymous drunk, has been forever tainted.

What the hell are you on about?
Re:Libel, anyone? by Phroggy · 2007-07-24 16:34 · Score: 1

It's libel against 365 Main, if it's not true.

--
$x='S24;r)>63/* h@<5+oZ)32"5cz';$me='phroggy'x$];
$x=~y+ -xz+\0-Tx+;print$_^chop$me for split'',$x;
Re:Libel, anyone? by innatetech · 2007-07-24 18:41 · Score: 1

Alright, fine, I'll log in this time. Not only is it not libel if it's true, but it's not libel if its false. It's not libel. Period. Not every printed falsehood that might cause injury automatically qualifies as libel. (Similarly, nor does every spoken falsehood that might do the same automatically qualify as slander.) This is especially so in the case of journalism, which receives nearly as much protection as does testimony before a court of law or a legislative body. Go read about qualified privilege at Wikipedia. http://en.wikipedia.org/wiki/Slander_and_libel

Help with shopping for hosts... by DogDude · 2007-07-24 10:47 · Score: 1

Hosts NOT to use: 365main

On a side note: 365main.com is up. Good to know where their priorities lie.

--
I don't respond to AC's.

LOLcurrent by carou · 2007-07-24 10:51 · Score: 2, Funny

I is not in ur datacenter, 2 power ur servers.

One Market going on and off by Animats · 2007-07-24 10:53 · Score: 1

Just called a friend at One Market, the big office tower downtown at the end of Market Street, and she says the power has been going on and off there for hours. Building alarms were sounding, but nothing serious was happening other than power loss.

Kiss of Death? by Honig+the+Apothecary · 2007-07-24 10:53 · Score: 4, Funny

Press Release on Red Envelope having 2 years of uptime at 365 Main - San Francisco from today: http://365main.com/press_releases/pr_7_24_07_red_e nvelope.html

Re:Kiss of Death? by MadMidnightBomber · 2007-07-24 20:39 · Score: 1

Well, there's the problem. Should have been called 365.25 Main - otherwise it's a whole day of downtime every four years.

--
"It doesn't cost enough, and it makes too much sense."

No wonder technorati wasn't working for me... by adnonsense · 2007-07-24 10:55 · Score: 1

... it was down a few months back, and as every blog owner and their dog include a little technorati script or graphic on their sites, they were loading very slowly, if at all.

So I edited my hosts.conf so technorati points at my localhost.

Can't say that's degraded my blog-reading experience in the least.

They should run this on OLPCs by EmbeddedJanitor · 2007-07-24 10:56 · Score: 1

You could then get all the geeks to crank the handles and keep the web running!

--
Engineering is the art of compromise.

Valleywag's Guess by immcintosh · 2007-07-24 10:56 · Score: 2, Funny

As someone who lives and works in San Francisco, I can attest that "a crazy homeless dude did it" is a fairly sensible first guess for most problems.

What are the chances of... by WK2 · 2007-07-24 11:04 · Score: 1

A widespread power outage and a gay wino vandalizing a datacenter on the same day?

--
Write your own Choose Your Own Adventure. http://www.freegameengines.org/gamebook-engine/

Re:What are the chances of... by way2trivial · 2007-07-24 12:17 · Score: 1

it's like this... when the power goes out, the AC goes off, and the data center is still really hot.. someone props open a door for air and 'whoops!'

--
every day http://en.wikipedia.org/wiki/Special:Random

And the line to get in by maximander · 2007-07-24 11:08 · Score: 1

They're not the speediest at letting you in at 365... this was taken about an hour ago from across the street: http://tastic.brillig.org/~jwb/dorks.jpg

July 24th: RedEnvelope Press Release by 365 Main by duplicate-nickname · 2007-07-24 11:12 · Score: 3, Interesting

This has got to be some type of joke: RedEnvelope Reports Two Years of Continuous Uptime at 365 Main's San Francisco's Datacenter.

It was released today....

--

ÕÕ

Re:No Generators? by dextromulous · 2007-07-24 11:12 · Score: 1

They do have backup power systems in place. Ten 2.1 MW "Continuous Power Systems" according to this document. I wonder how close they were to guaranteeing 99.99 percent uptime this year...

--
There are two types of people in the world: those who divide people into two types and those who don't.

About Emergency Power by linuxwrangler · 2007-07-24 11:13 · Score: 5, Informative

It's been a long time since I went on a tour of several data centers to locate a new facility for our dot-com. I believe that 365 Main was a facility that does not use a battery UPS. Instead, they have engine-backed flywheel UPS system (see http://www.enterprisenetworksandservers.com/monthl y/art.php?2813 for a description). At the time, they have 10 2-megawatt generators on the roof in a N+2 configuration. The engines are kept heated and are spec'd to go from stop to engage-clutch/deliver-power in 3 seconds. The flywheel can deliver 11 seconds of power so they can fail through a couple of bad engines before running out of flywheel power. They periodidally do a 20-hour load test into a pair of 500,000 watt heat-sinks. Time will tell if this outage was a failure of design, failure of maintenance, or outright malfeasance. But it wasn't supposed to happen. They've got some 'splainin' to do.

As to diesel storage, use of diesel is widespread for emergency use everywhere from hospitals to emergency-services to hospitals. Those systems are run regularly - typically weekly. The use of biocides, stabilizers, and mobile fuel-scrubbing services, and extra filtration systems can maintain the fuel quality. Our colo currently maintains a 1-week fuel-supply and has multiple quick-refuel contracts in place. I can't imagine any colo having less than 24-48 hours in-the-tank with quick-refill on-call.

But one thing that is missing is cooling. Our colo has a typical contract that says something like blah-blah won't exceed 80F for more than 4 hours blah blah. OK, but a rack full of blade servers can crank out 15-20kW of heat load and a data center can heat up real quick without AC. By contract, 150F for 3.5 hours would be in-spec.

--

~~~~~~~
"You are not remembered for doing what is expected of you." - Atul Chitnis

Re:About Emergency Power by Repugnant_Shit · 2007-07-24 11:39 · Score: 1

One thing about fuel - always fill up different tanks from different vendors and on different days. One of my customers had a site go down because both the permanent and the roll-up generator were filled from the same tanker truck that had bum fuel in it.

--
Vote for global prefs bug
Re:About Emergency Power by StikyPad · 2007-07-24 14:56 · Score: 1

diesel is widespread for emergency use everywhere from hospitals to emergency-services to hospitals.

What about hospitals?

But seriously, I can count the number of restaurants/stores here that don't have generators on one hand. Loss of power can mean loss of cold storage inventory, which can easily add up to more than the cost of a generator. Aside from that, when the power's out you stand to do a whole lot of business if you're open.. from batteries to food to batteries. ;)

--
https://www.eff.org/https-everywhere
Re:About Emergency Power by TooMuchToDo · 2007-07-24 15:01 · Score: 1

When I helped a large organization get data center space, the contract stated that the average temperature couldn't exceed 85F in a 4 hour period (their request, not mine). Something to think about as more and more people are looking for colo space. I'm not saying your contract isn't up to par, as I'm sure it is for the services you're delivering. I'm just saying that customers are becoming more demanding.
Re:About Emergency Power by Technician · 2007-07-24 15:50 · Score: 1

The flywheel can deliver 11 seconds of power so they can fail through a couple of bad engines before running out of flywheel power.

True, but it's undersize. The generators is a couple megawatts X10. The single rotary CPS size isn't specified, but from the manufactures site they list several sizes of units all under 1.5 MW. Most of the datacenter is NOT on critical power. From the size, it certanly isn't taking the support HVAC load. Since power was off for a couple hours and they didn't start generators for about 45 minutes, I'm sure any battery UPS'es died. Most racks overheat if powered up without cooling in under 15 minutes.

--
The truth shall set you free!
Re:About Emergency Power by Ohreally_factor · 2007-07-24 22:36 · Score: 1

Another thing. Don't stir the tanks. You could have a bus bar overvoltage. Then you start venting oxygen and lose gimble lock.

--
It's not offtopic, dumbass. It's orthogonal.

Re:No Generators? by Frosty+Piss · 2007-07-24 11:13 · Score: 4, Insightful

If the "power outage" theory is correct and the "drunken employee" theory is incorrect, as a customer...

For me it would be other way around. A technology failure I could understand. Letting a drunk employee near my server rack, I could not.

--
If you want news from today, you have to come back tomorrow.

Re:No Generators? by MichaelSmith · 2007-07-24 11:14 · Score: 5, Interesting

Stuff happens

No kidding. years ago in my former job on traffic systems we had a great UPS with a generator on site and the ability keep it fueled up indefinitely. A security contractor came in on the weekend to install something and tried to wire up a new circuit hot. He slipped with a screwdriver and shorted the white phase to the chasis of the breaker panel. I don't think the tip of the driver actually touched ground, but the burn mark is still there to show how close he got.

The resuting current spike blew the 100A fuses (heavy metal strips) both going in to and out of the UPS. With the UPS effectively broken the generator set failed to start and the system gracefully shut down 40 minutes after the incident. Thats not bad. The batteries were only specified to work long enough for the genny to settle at 50Hz.

In the process of blowing the fuses a spike got back into the power supply of one of our DEC Alphas and took out the power supply. The system was redundant at the software level so I didn't notice immediately.

The UPS guy came out and didn't have enough fuses to replace the blown one, but we found that with a bit of brute force and filing attacks some others could be made to fit.

Please type the word in this image: problems

--
http://michaelsmith.id.au

UPS system - it's a Hytec flywheel/diesel combo by Animats · 2007-07-24 11:16 · Score: 3, Interesting

Data sheet for 365 Main:

The company's San Francisco facility includes two complete back-up systems for electrical power to protect against a power loss. In the unlikely event of a cut to a primary power feed, the state-of-the-art electrical system instantly switches to live back-up generators, avoiding costly downtime for tenants and keeping the data center continuously running.

They use a Hytec Continuous Power System, which is a motor, generator, flywheel, clutch, and Diesel engine all on the same shaft. They don't use batteries.

With this type of equipment, if for some reason you lose power and the generator doesn't start before the flywheel runs down, you're dead. There's no way to start the thing without external power. Unless you buy the optional Black Start feature, which has an extra battery pack for starting the Diesel. "Usually the black start facility will not be often needed but it won't hurt to consider installing one. Just imagine if you were unable to start up your UPS system because the mains supply is not available.". Did 365 Main buy that option?

Re:UPS system - it's a Hytec flywheel/diesel combo by DerekLyons · 2007-07-24 12:45 · Score: 1

With this type of equipment, if for some reason you lose power and the generator doesn't start before the flywheel runs down, you're dead. There's no way to start the thing without external power. Unless you buy the optional Black Start feature [pageprocessor.nl], which has an extra battery pack for starting the Diesel.

This is why (US) submarine Diesels use compressed air for starting...
Re:UPS system - it's a Hytec flywheel/diesel combo by memfrob · 2007-07-24 12:56 · Score: 1

With this type of equipment, if for some reason you lose power and the generator doesn't start before the flywheel runs down, you're dead. There's no way to start the thing without external power. Unless you buy the optional Black Start feature, which has an extra battery pack for starting the Diesel

How... JV.

If you want to see how its done, go tour a telco managed facility (Not ncessesarily a datacenter, but the BMF - the place with the big wall where your phonelines are fused). I've had a tour of the one near me - no windows, one door, power from two separate grids, massive diesel generator (and a priority on fuel in case of disaster - see Katrina and the MSY BMF), and a huge room filled with enough marine batteries to keep the place going for hours.

Ma Bell - when it absolutely, positively, has to have a dialtone.

--
The Wizard utters the word 'frobnoid!' and cackles gleefully
Re:UPS system - it's a Hytec flywheel/diesel combo by Animats · 2007-07-24 13:18 · Score: 4, Interesting
The classic Bell System policy on emergency generators, in the electromechanical switching era, was as follows:
- Generators are started once a week.
- Once a month, generators are started and run for an hour.
- Once a year, generators are started and the entire facility run without external power for 24 hours.
And this was in addition to the 48VDC battery backup.
In the entire history of electromechanical switching in the Bell System, no central office was ever down for more than 30 minutes for any reason other than a natural disaster. That record has not been maintained in the computer era.
If you have to build reliable systems, it's worth understanding electromechanical telephone switching. Because the components weren't that reliable, the systems had to be engineered so that the system as a whole was far more reliable than the components. Read up on Number Five Crossbar. The Wikipedia article isn't really enough to understand the architecture, but other references are available.
Re:UPS system - it's a Hytec flywheel/diesel combo by TooMuchToDo · 2007-07-24 14:53 · Score: 1

The Bell systems were up all of the time (and had rockstar uptimes) because it was legislated that they had to. Because 911 was being served by them, they had no choice. As the net moves forward, these same requirements aren't being built into systems, and it's going to bite us in the ass someday. Hard.
Re:UPS system - it's a Hytec flywheel/diesel combo by TooMuchToDo · 2007-07-24 14:58 · Score: 1

Hopefully, if they didn't get the Black Start feature, one of the employees has a large enough vehicle/alternator to jump start them =) I kid of course.
Re:UPS system - it's a Hytec flywheel/diesel combo by slimey_limey · 2007-07-24 16:01 · Score: 1

Have you ever seen a 5XB start from cold power-off? I have. It takes one second. Literally.

Somewhat tangentially related fact: Western Electric switches were specced to be able to run at around 130F indefinitely with little ill effects. A typical urban crossbar office would use around 4000A/48V, which isn't very much compared to a modern data center.

--
☠
Re:UPS system - it's a Hytec flywheel/diesel combo by aaarrrgggh · 2007-07-24 16:03 · Score: 1

NFPA 110 only applies to life safety systems-- specifically human egress. For that, I think the SF 365 main has an extra generator (that's how some of the other ones are set up), or they could just put battery packs in the lights.

For most generators for the biggest banks, we specify redundant starting systems, but what you really end up getting isn't a 2N system (on a CAT engine for sure), unless you go with compressed air backup starting. Compressed air gets pretty tricky when you need to meet NFPA 110, or when you are running six engines in parallel.

The bigger issue will end up being fallout from co-lo in general, for companies that have either customer or UL requirements for backup provisions.

All told though, I think it is best to go with half rotary and half static UPS systems in a 2N arrangement. Rotary works great for the blips that you see most often, and the batteries work well for longer discharge times.
Re:UPS system - it's a Hytec flywheel/diesel combo by Sycraft-fu · 2007-07-24 18:49 · Score: 1

Our phone/datacentre on campus still does it fairly similarly, the generators kick on quite often for testing and there's battery backups.

One thing I will note though is that the phone company is a little disingenuous when it comes to uptime numbers. For them, you are "up" so long as any single phone line can place a call to any other single phone line. So long as you've got two that are working, you are up. By that standard, out voice network has never been down (which is a fact the voice guys love to trot out). Thus while the overall system may be really reliable in terms of almost never totally failing, it isn't quite as impressive as they like to claim. By that standard the campus data network also has 100% uptime and always will barring a nuclear attack. There's always at least two nodes that can talk to each other. However by the more useful standard of no ports being out it is more like 99.9% reliable.
Re:UPS system - it's a Hytec flywheel/diesel combo by Anonymous Coward · 2007-07-24 21:00 · Score: 4, Informative

I was a hardware engineer about 10 years back on the battery backup systems. We were developing new technology to try and stretch the life of the batteries. We worked together with some of the top minds in battery technology in the US.

The battery systems that are installed in the "Bomb cages" as we called because the larger ones were often underground and appeared similar to a 3 person bomb-shelter where quite impressive. Typically, they were two full banks of twenty four, 2 Volt, 375 AMP batteries. Each of them physically twice the size of a truck battery. They were most often lead-acid mammoths at the time since lead acid was reliable for a measurable period of time and inexpensive in comparison to the lithium-ion variety in the same capacity.

The batteries were always rated at 10 years life from the manufacturer, but the telephone companies had tested in real-world environments and would rotate the cells out at 4 year intervals instead since down-time on the network to replace power systems was far more expensive then being prepared instead. After all, each one of these cabinets would typically handle as many as 15,000 telephone lines and would often contain fibre repeaters for higher speed lines connecting the boxes all together and then to the central.

The biggest problem with these installments was that a single battery in a shipment would show signs of early fatigue, most typically visible from the appearance of bubbling in the plastic walls, then it was policy to replace the entire batch of cells immediately, not just the single battery displaying fatigue. This was because it was clear that if a single battery in the group showed fatigue then all the cells in the bank would probably be susceptable to the same issue. It could be something as simple as a manufacturing screw up or it could be due to a cooling system problem in the box, or any of a lot of other environmentally related issues.

It's really quite impressive the cost and efforts the telephone company would go through just to maintain and prevent issues with the UPS system which thankfully, rarely ever gets exercised in places where people are intelligent enough not to live on fault lines or high risk hurricane paths.

The greatest flaw in the design of the batteries systems was that they were always trickle-charged. The chargers were unintelligent and simply kept the batteries topped off. This caused "memory issues" as we're all familiar with, especially thanks to notebook batteries.

What we learned about the cells where I was engineering was that, if a cell could physically survive as long as 7 years without environmentally related damage (bubbles), then it should be possible to detect early stages of design related fatigue within a single cell.

We also found that if a weekly or monthly power cycle of a bank of cells were to be performed, the batteries would last substantially longer than the 4 years expectancy. So, in the case of Bomb Cages where at least two full banks of cells were available (that's pretty much a minimum configuration), on a proper schedule, using a huge-ass resistor bank, we would fully drain a bank of cells until we could detect nearly 0 current across the resistor. Then we would perform a full charge on the cells again, monitoring each cell more than 10 times per second. Batteries that failed to charge in sync with the other cells were typically early replacement candidates.

Well, all that being said, one thing I'm 100% confident of is that data centers lack the experience and the interest to budget this kind of research for their systems. The telephone companies are amazingly well prepared in comparison.

On a side note, just last week, I installed my first 48V DC powered RAID rack. I designed a high efficiency hard drive case that contained no fans. Each case was 1U and shallow enough to install two back-to-back in a rack. We installed 96 units in a single rack with 4 drives each and no-air conditioning in the room. The design was extremely simple.

1) Use Telco

"We're Dead" by akita · 2007-07-24 11:21 · Score: 2, Funny

Pinging openbsd.org [199.185.137.3] with 32 bytes of data:
Reply from 199.185.137.3: bytes=32 time=239ms TTL=236

Pinging freebsd.org [69.147.83.40] with 32 bytes of data:
Reply from 69.147.83.40: bytes=32 time=191ms TTL=47

Pinging netbsd.org [204.152.190.12] with 32 bytes of data:
Reply from 204.152.190.12: bytes=32 time=213ms TTL=241

Lost irony.

Re:"We're Dead" by Zekasu · 2007-07-24 11:24 · Score: 1

As I was about to say, they don't look that dead to me.

Google street view out? by Ungrounded+Lightning · 2007-07-24 11:22 · Score: 1

Wanted to look at 365 main in google maps' street view but the button isn't available.

Doesn't seem to be showing airborne/satellite images either.

--
Bantam Dominique roosters crow a four-note song. Once you've heard it as "Happy BIRTHday" you can't NOT hear it that way

I need to change my reading order by pluther · 2007-07-24 11:24 · Score: 1

I just tried to look at my blog on livejournal, and got a 403 error, not 404. Intermittent errors are quite common on lj, so I thought I'd try again later.

So then I checked my Netflix queue, and couldn't get to it (got a 404 error there, though, not a "nice \"we're dead\" message" - two sites in a row indicate the problem might be local.

Good thing slashdot was my next stop, not one of the many others. I had no idea all those sites were run out of the same location in SF.

San Francisco has always seemed to me to be a strange place to run a server farm. Aside from the crazy drunk homeless people, you also have occasional earthquakes, and some of the most expensive real estate on earth. An acre in Arizona can cost the same as a square foot in SF, so how come all these places are in SF and not the middle of the desert? Or Alaska, if you want to save on air conditioning...

--
If the masses can keep you down, you're not the Ubermensch.

Re:I need to change my reading order by RazzleDazzle · 2007-07-24 12:42 · Score: 1

There are not a lot of carriers with POPs in the middle of the desert (or frozen forests) I am guessing. Or power, or easy access to diesel fuel providers, or hardware vendors, or a million other useful things colo providers like to be around.

Ask Qwest or Verizon or AT&T or ... if they have many feeds 30 miles south of the Hwy 264 sign on this map:
http://maps.google.com/?ie=UTF8&ll=35.851213,-111. 133575&spn=0.988434,1.867676&t=h&z=9&om=1

Yeah there would never be any sand getting into the AC filters nor pounding sun raising the temp.

I guess you'd have a lot less random crazy people visiting and causing trouble out there more some from the random scorpions acting as security. :)

I am not arguing for SF to be used as a colo facility, but come on, there is a lot of wealthy people with money to spend in the SF area. If people are willing to spend their money, people will come up with stuff to sell them.

--
ZERO ZERO ONE ZERO ONE ZERO ONE ONE! Just brushing up for my next big invention: Ethernet over Voice (EoV)
Re:I need to change my reading order by superdude72 · 2007-07-24 14:22 · Score: 1

An acre in Arizona can cost the same as a square foot in SF, so how come all these places are in SF and not the middle of the desert?

The Financial District is a major hub of finance for the entire Pacific Rim. A lot of data center stuff is shipped out to remote locations, but there is still enough left over to support some data centers in close proximity to downtown. Sometimes you need to have physical access to your equipment. Besides, South of Market is traditionally an industrial zone. It boomed and gentrified during the dotcom era, but there are still a lot of light industrial type businesses around. Commercial space was overbuilt in the '90s, which led to a surplus which caused prices to come down to Earth. It's not as crazily expensive as residential real estate, which never really came down in price even after the dotcom implosion. The real estate in SoMa isn't that much pricier than in many other parts of the Bay Area, particularly down on the Peninsula. And anyway the cost of real estate is less significant given the number of servers one data center accommodates. The real estate cost is easily outweighed by the benefit of being close to so many well-heeled clients who require your services.
Re:I need to change my reading order by Blackknight · 2007-07-24 15:01 · Score: 1

There ARE data centers in Arizona, Iknow of a few hosting companies located in Phoenix.

Sun by hansamurai · 2007-07-24 11:25 · Score: 1

How coincidental that I was actually trying to reach a Sun page before and couldn't get to it. I don't even remember what it was anymore, I really need to make my Firefox closed tabs list longer than 5.

--
Reviewing just the first hour of video games.

Re:Sun by Gazzonyx · 2007-07-24 14:15 · Score: 1

How coincidental that I was actually trying to reach a Sun page before and couldn't get to it. I don't even remember what it was anymore, I really need to make my Firefox closed tabs list longer than 5. Doesn't matter, I went to 10 and I'm always looking for the 11th. If you go to 20, you'll need the 21st - Just drop in another gig of RAM and leave them open until the CPU bug hits you :). Actually, I sync my computers bookmarks, history and open tabs with a google plugin that encrypts the info and stores it on your google account. It's nice to have 20 open tabs at home before leaving for work, closing them all and opening firefox when I get to work and having a google popup in the corner ask, "would you like to open the last tabs you've had open?", and then being right back up to speed from where I left off. Even updates my bookmarks, cookies, history, and stored passwords. From one geek to another, I highly recommend it. I'm sure you know how to find it. :)

--
If I mod you up, it doesn't necessarily mean I agree with what you've said, sorry.

Re:No Generators? by latras · 2007-07-24 11:27 · Score: 2

Exactly! I worked in a small Telecom in Kansas and we had UPS and Generator backup and tested running full load 4 times a year.... It's fun doing that, throwing the switch to turn off utility power then hearing the KA-THUNK as the switchgear switched from utility to generator. I would think these large sites are going to pitch a bitch.....

Re:No Generators? by Anonymous Coward · 2007-07-24 11:29 · Score: 2, Insightful

Wait, you think its OK to advertise five nines reliability, UPS backup, and generator backup, only to find out that the systems were not being properly tested to meet the advertised capability?

Re:Gmail down? by Pakaran2 · 2007-07-24 11:34 · Score: 1

Also google has datacenters in several cities. They could probably deal with an outage in San Francisco by just dropping it from the roundrobins.

Re:No Generators? by Anonymous Coward · 2007-07-24 11:42 · Score: 1, Insightful

What is "high availability". 99% uptime is 3.5 days down. 99.9% is 9 hours down. 88.88% is nearly an hour down. Certainly these sites can still be considered 3 nines high availability.

Difference of Opinion? by Shinra · 2007-07-24 11:44 · Score: 1

Funny thing though, the same sort of story on Yahoo! News reports
that Netflix's downtime is NOT related to this incident:

http://news.yahoo.com/s/ap/20070724/ap_on_hi_te/ne tflix_woes

"The online hub of Netflix's rental system went down Monday evening and remained unavailable until Tuesday afternoon, locking out subscribers for more than 18 hours. Spokesman Steve Swasey attributed the outage to an unanticipated problem that he declined to describe.

The breakdown didn't appear to be related to San Francisco power outages that were blamed for temporarily knocking out several popular Web sites, including Craigslist, Technorati, Typepad and Livejournal.

Service to Netflix's site was finally restored around 3 p.m. PDT after Netflix's engineers had missed several earlier estimated times for fixing the trouble."

So, is it just the Business Writer trying to put a biased spin on this story, or is there more to it then that?

Re:No Generators? by sleigher · 2007-07-24 11:44 · Score: 1

You are absolutely right. The co-lo we use is just down the street from there and the last few times they "tested" their generators we had outages. Five times in the last 1.5 years. those were the "unsuccessful" tests anyways. Needless to say we are moving to another co-lo.

--
All points of time and space are connected.

Caused by a transformer explosion by rev063 · 2007-07-24 11:45 · Score: 1

According to sfgate.com: "The source of the power failure appears to be an explosion in a transformer vault under a manhole in a plaza at 560 Mission St. in San Francisco... Witnesses said they heard an explosion at about 1:50 p.m., then saw flames coming from the manhole."

Re:Caused by a transformer explosion by wolenczak · 2007-07-24 12:19 · Score: 1

Poor man.

Back in the day by arkanjil · 2007-07-24 11:50 · Score: 1

I biked past the place twice a day for years- they rehabbed and prepped the building up as a datacenter just in time for the dot.com to crater. It was left cold for a few years, but then there were a spat of articles in the local press, talking about the cheap hosting deals being offered, and of the incredible redundancy built into the the place in case of disaster. They've promised a lot, over the years, and whatever they cause may be, it really looks like they failed to deliver.

Re:Back in the day by Ohreally_factor · 2007-07-24 22:40 · Score: 1

Unfortunately, part of that redundancy was to have the colo in the same building! Oy vey!

--
It's not offtopic, dumbass. It's orthogonal.

Not that uncommon by Phil+Wherry · 2007-07-24 11:50 · Score: 2, Interesting

I really feel for all the folks who have to deal with this outage; it's no fun at all!

A client of mine had a number of servers in a Sterling, Virginia data center managed by Verio/NTT. It's a good data center and seems to be well-run.

Last September, the data center experienced two complete power failures in the span of three days. To their immense credit, data center management was straight with customers about what had happened. For those who might be interested, their statements about the problem appear here.

My point? Make sure you know how to bring your systems back up from a completely cold start, and that you find a way to test this periodically. While we work to ensure that this sort of situation occurs rarely, the fact remains that these sorts of failures DO occur, and they're not as uncommon as the sales and marketing folks would like you to believe.

Phil

Re:Not that uncommon by WuphonsReach · 2007-07-25 02:40 · Score: 1

We've been using NTT/Verio for almost 7 years now ourself. They do a pretty decent job at informing us of the "why" when things fail. They're also good about telling you ahead of time when they're going to be doing work.

We've had them (since we lease the servers) swap out parts on servers that have failed before we even get a chance to call them.

(Needless to say, we have no qualms about renewing our contract again this year.)

--
Wolde you bothe eate your cake, and have your cake?

Re:No Generators? by apoc.famine · 2007-07-24 11:53 · Score: 5, Funny

I tried to mod the article "-1 Not Redundant" but it wasn't an option. And I didn't have mod points. At least my inability to function only warrants a comment, rather than a slashdot article.

--
Velociraptor = Distiraptor / Timeraptor

Insane level of backup... by SmoothTom · 2007-07-24 12:03 · Score: 5, Interesting

...until the commercial power fails and doesn't come back for days.

The only places I've actually seen the insane levels of backup that some would like is in some telco central offices. The one I was associated with the longest had eight-hour-plus battery backup and 8 days of fuel for the diesels. Some of our really remote microwave sites had 24 hour battery and 30 day diesel.

Of course one of those sites failed high up in a mountain range in a mid-winter storm (Tieton, 1978) when the commercial power failed, and the starter battery for the diesel froze. When one of the techs finally got there (after burying his Sno-Cat and walking the last couple miles), he had to chip ice off the steel door to get inside, where he was able to get the diesel started with a little "rewire" of one of the backup battery sets. Oh, his two-way radio also failed during his hike, since it was outside his snowsuit, and the lack of communication caused the company to start two more Sno-Cats and a helicopter in that direction.

The site was out for nearly six hours, IIRC.

Even the BEST designs are subject to failure. :o(

--
Tomas

Re:Insane level of backup... by dgatwood · 2007-07-24 13:19 · Score: 1

One word: heaters. :-)

--
Check out my sci-fi/humor trilogy at PatriotsBooks.
Re:Insane level of backup... by SmoothTom · 2007-07-24 13:41 · Score: 2, Interesting

Yup, heaters. The entire site was set up insulated/heated, with additional heaters on the batteries, including the start battery, but, uh, somehow the start battery heater was found to be switched "off"... :o(

--
Tomas
Re:Insane level of backup... by Technician · 2007-07-24 14:55 · Score: 4, Funny

Of course one of those sites failed high up in a mountain range in a mid-winter storm (Tieton, 1978) when the commercial power failed, and the starter battery for the diesel froze.

On Black Bute in Oregon, a communications site went out in the middle of winter following a power outage. The generator ran a short while and shut down because it overheated. The air intake 20 feet in the air was covered in snow.

--
The truth shall set you free!
Re:Insane level of backup... by descil · 2007-07-24 18:10 · Score: 1, Troll

FalconStor's IPStor software makes this problem go away by backing up the data to a remote datacenter. Lots of people do this; I spent three years wandering around the country installing the software and hacking it into blue chip companies. It was a blast to make people more secure than they'd ever been.. the software offers you instant disaster recovery on the remote site, with a backup that is only as old as the network data. It supports oracle and microsoft sql server and exchange server, clustering, linux, suse, solaris, hpux, aix... www.falconstor.com It also comes in an active-active "failover" mode to provide full service to all the systems 2 pcs can (40 fileservers, databases, etc, or 200 desktop PCs)... with the ability for one server to take over in case the other one has a hardware problem. So you will always be functional, and so will your backups... anyway I don't work for them anymore, but your description of a fully secure datacenter makes me twitch!
Re:Insane level of backup... by RoloDMonkey · 2007-07-24 23:23 · Score: 1

and the lack of communication caused the company to start two more Sno-Cats and a helicopter in that direction.

They probably figured that the winter caretaker had turned into an axe wielding maniac, who had killed your tech and was now chasing his family around a hedge maze or topiary.

--
Long live the Speaker Bracelet
Rolo D. Monkey
Re:Insane level of backup... by fimbulvetr · 2007-07-25 02:08 · Score: 1

Like the andrea gail's epirb, let this be a lesson for those who think there should be an off-and-completely-disabled button for some important things.

People do stupid things!
Re:Insane level of backup... by fimbulvetr · 2007-07-25 02:15 · Score: 1

That's fine and dandy for a data backup, but he was talking about a telco CO. It's not like it's just a redundant webserver - there are physical connections that exclusively connect to that location and that location only.

P.S. Try to write your posts a little less, well, advertisement like. Unless you're a shill. In that case, carry on.
Re:Insane level of backup... by HeroreV · 2007-07-25 05:56 · Score: 2, Insightful

If the heater is really that important, it should be reporting back at regular intervals that it's on, and when the signal isn't being received anymore there should be a process so that somebody calls and asks what's up. If somebody wanted to turn it off and couldn't, they'd just unplug it.
Re:Insane level of backup... by Nosferatu+Alucard · 2007-07-25 09:13 · Score: 1

Methinks that guy deserved a raise, or at least an award. That's dedication.
Re:Insane level of backup... by SmoothTom · 2007-07-25 12:12 · Score: 1

These days just about everything at one of those remote sites is digitally monitored (at almost zero expense), because even something as simple as a resistance heater wrapped around a battery (picture it wearing a tiny electric blanket) has some sort of intelligence built in.

In the seventies, however, it would have taken a sensor of some sort, an alarm relay and possibly even another alarm channel back to the alarm center 160 miles away.

(These days I have a cheapo digital wristwatch that has more and faster smarts than the entire site alarm system, and probably at 1/1000th, the cost. Those alarms from the site were displayed at the alarm center as audible tones and indicator lights...brutally analog and very limited in number. A continuous "low tone" and a red light was a major power alarm, IIRC.) :o)

--
Tomas

steam kicking everyone again by brunascle · 2007-07-24 12:04 · Score: 1

ahhh, that explains why steam is kicking everyone repeatedly.

How to get reliable hosting.... by p_trekkie · 2007-07-24 12:05 · Score: 1

...don't host your data at the same location as livejournal! They say lightning doesn't strike twice but...

For a minute there, I thought this story was a dupe....

Re:July 24th: RedEnvelope Press Release by 365 Mai by Phroggy · 2007-07-24 12:08 · Score: 1

Somebody forgot to knock on wood. I bet they'll think twice about releasing a press release like that again!

--
$x='S24;r)>63/* h@<5+oZ)32"5cz';$me='phroggy'x$];
$x=~y+ -xz+\0-Tx+;print$_^chop$me for split'',$x;

Certification of Non-Stupidity? by fm6 · 2007-07-24 12:14 · Score: 1

Well, that's not the only scenario — but all the other ones I can think of call for even higher levels of stupidity. Maybe they had enough backup, but somebody forgot to buy diesel. Maybe the widget that's supposed to make backup come on automatically hadn't been properly maintained.

I myself had the misfortune to be working the help desk at a colo provider when some clueless tech working in the battery room disconnected the wrong cable and powered down the whole building. The really unpleasant part was answering the question every caller asked: "DON'T YOU IDIOTS HAVE BACKUP POWER?"

When you buy rack space, you naturally expect to get backup power. All providers claim to have it, but over and over your hear reports of outages where backup didn't kick in. What's needed is some independent authority to certify that the provider not only has adequate backup, but also has all the maintenance and testing procedures in place that guarantee that the bloody thing works.

The irony... by jaclu · 2007-07-24 12:21 · Score: 1

On the frontpage of 365 Main, the top item in "In the news" is:

RedEnvelope Reports Two Years of Continuous Uptime at 365 Main's San Francisco Data Center. Online Retailer Also Cuts Energy Costs by 33 Percent.

Re:GameFAQs by fahrvergnugen · 2007-07-24 12:23 · Score: 2, Funny

It is from his terrible spelling we can tell he is a GameFAQs forum poster.

--
Even Jesus hates listening to Creed.

Re:No Generators? by wolf31o2 · 2007-07-24 12:24 · Score: 2, Informative

Funny enough, there was a press release put out today talking about how the 365 Main facility had given 100% uptime over the past 2 years. Yes, 100% uptime for a facility is very possible. All it needs is to stay online and providing power and cooling.

Re:No Generators? by dwater · 2007-07-24 12:29 · Score: 1

He didn't say he'd think it was OK, only that it was understandable...I think I agree. I would roll heads in either case, but probably be more outraged by the drunk access.

--
Max.

Re:No Generators? by computerman413 · 2007-07-24 12:32 · Score: 4, Funny

88.88% uptime causes less outage than 99.9%? I don't follow your math. Did you do it with an Intel chip, by any chance?

Bummer about your job, man. by rmerrill11 · 2007-07-24 12:43 · Score: 1

(This space intentionally blank.)

Valleywag by fnorky · 2007-07-24 12:46 · Score: 1

It would have been nice if someone had linked to a reliable source, like SF Gate instead of a gossip rag's wet dream.

Re:July 24th: RedEnvelope Press Release by 365 Mai by slacktide · 2007-07-24 12:51 · Score: 1

You missed the second part of the headline: "Online Retailer Also Cuts Energy Costs by 33 Percent" And today, they've cut their energy costs by 100%!

hmm.... by poetmatt · 2007-07-24 12:52 · Score: 1

I called comcast earlier as my friend can access the site but I cannot, and he lives a mile north of me.

I believe there is a lot more going on here than what is mentioned, comcast said that it was an AT&T link to the backbone that was refusing connections. I don't know a ton about networking but it seems to be back and functional now, and earlier when I would tracert the IP down was 12.116.17.7, if that helps you guys to peek at what it is.

Re:hmm.... by Maserati · 2007-07-24 14:32 · Score: 1

I was talking to our IT Director this afternoon, after power came back up. Apparently there was a backbone outage somewhere between Dallas and Los Angeles and a lot of sites in the South West were out this afternoon. Crap, I'll prolly get spammed by the paging system once the queue clears.

--
Veteran, Bermuda Triangle Expeditionary Force, 1992-1951

Re:No Generators? by Doobian+Coedifier · 2007-07-24 13:01 · Score: 2, Informative

Heh, that was July 30, 2006, I remember it well. Seattle City Power was taken out by nearby contruction. The UPSes came online, but one of the generators failed to switch on, so the batterys drained in ~15 minutes. The entire DC didn't lose power, but a good portion of it.

Re:Hm by _KiTA_ · 2007-07-24 13:12 · Score: 1

Looks like their site is up. This is probably FUD to generate blog hits.

Hanlon's Razor leads to an easier explanation -- the Slashdot editors took several hours to promote this from the inbox to the main site.

never mind... by Ungrounded+Lightning · 2007-07-24 13:14 · Score: 1

... Wasn't supported on that version of Firefox. But worked on an older Netscape.

--
Bantam Dominique roosters crow a four-note song. Once you've heard it as "Happy BIRTHday" you can't NOT hear it that way

According to their own press release... by Kadin2048 · 2007-07-24 13:15 · Score: 3, Funny

Well, according to their self-congratulatory press release, issued earlier today, they were allegedly at 100% uptime for the past two years.

The irony of issuing a press release like that, and then to be hit with a power outage and apparent simultaneous failure of all backup systems later that day, is beyond measure.

I don't know about God, but it's enough to make me believe in karma. ;-)

--
"Ladies and gentlemen, my killbot features Lotus Notes and a machine gun. It is the finest available."

Re:According to their own press release... by multipartmixed · 2007-07-24 13:44 · Score: 1

How much ya wanna bet they hit the karma cap?

--

Do daemons dream of electric sleep()?
Re:According to their own press release... by EvanED · 2007-07-24 15:16 · Score: 1

Oh look, it's now delivering a 404. What a surprise.

We were always at war with Eastasia.
Re:According to their own press release... by Provocateur · 2007-07-24 16:46 · Score: 1

Well I believe in God, and the only thing that scares me is Keyser Soze. Standing in front of a rack of servers.

--
WARNING: Smartphones have side effects--most of them undocumented.
Re:According to their own press release... by zerkon · 2007-07-25 08:38 · Score: 1

Link now goes to a 404... I think they saw us all laughing at them and took it down

--
The Answer

SAN? Huh? by QuoteMstr · 2007-07-24 13:25 · Score: 1

Forgive my ignorance, but how would using a SAN have helped in this situation? Are you proposing that a single SAN storage net span multiple (remote) physical locations? And with SAN, can't a disk only be used by one computer at a time anyway?

Sure, you could use RAID 1(+0) and put the mirrored halves at different locations, but I can't imagine that being acceptable from either a performance or a reliability point of view.

Wouldn't master-slave database replication be more appropriate for this kind of work?

Re:SAN? Huh? by Pathwalker · 2007-07-24 14:09 · Score: 3, Interesting

Are you proposing that a single SAN storage net span multiple (remote) physical locations?
It's pretty common - at a previous job, all of the disk arrays at three main sites kept themselves in sync using SRDF over a metro area network. The intent was, that even if one site was completely destroyed, the survivors could quickly return to work without losing any data.

HP has a nice overview of building systems which can failover between widely distributed nodes called Designing Disaster Tolerant High Availability Clusters. It's a bit old, and is focused on ServiceGuard, but is still interesting.
Re:SAN? Huh? by kasin · 2007-07-24 14:09 · Score: 1

Forgive my ignorance, but how would using a SAN have helped in this situation? Are you proposing that a single SAN storage net span multiple (remote) physical locations? And with SAN, can't a disk only be used by one computer at a time anyway? Master-slave SAN replication. Replicate all your data (not just databases) a/synchronously to a remote datacentre. If the primary fails, mount the slave SAN and serve from your DR datacentre servers. Voila, relatively cheap and easy way of ensuring a max of 10 minute downtime, while retaining up to the second data consistency no matter the source.
Re:SAN? Huh? by Door-opening+Fascist · 2007-07-24 14:12 · Score: 1

Forgive my ignorance, but how would using a SAN have helped in this situation? Are you proposing that a single SAN storage net span multiple (remote) physical locations? Most modern SANs have some kind of remote replication technology available. It can either be run over Fibre Channel and sent directly to the backup site, or run over an IP network with iSCSI.
And with SAN, can't a disk only be used by one computer at a time anyway? Nope. Lots of filesystems are designed to be clustered. Sun QFS and IBM's GPFS come to mind, but there's lots of others.
Wouldn't master-slave database replication be more appropriate for this kind of work? Only if you're using databases. At my site, we've got over 50TB of data over about 80 million disk files, so lots of things that should be easy aren't. :)
Re:SAN? Huh? by afidel · 2007-07-24 14:44 · Score: 1

SAN's can and do span multiple physical locations, though if distances are too far latency becomes an issue. Most SAN's allow multiple hosts to be targeted at a single LUN, this is how the Quorum disk in a MS cluster works and how multiple nodes in a RAC cluster work. But, the reality is most SAN's won't be simply spanned across sites, they will generally have a near synchronous replication piece, either built in or as a bolt-on addition which ensures your data is available at your DR site. I know that with our SAN replicated I can have physical boot from SAN servers and VMWare hosts running with little/no interruption of service and no additional work to maintain them.

--
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
Re:SAN? Huh? by Big+Jason · 2007-07-24 15:10 · Score: 1

Since when is 2x the storage and a high bandwith link b/w sites cheap?
Re:SAN? Huh? by MadMorf · 2007-07-24 15:25 · Score: 1

Are you proposing that a single SAN storage net span multiple (remote) physical locations?

NetApp has a product called MetroCluster in which redundant SANs can be up to 100km (60 miles?) apart...

--
Goofy, Geeky Gifts and More!
Re:SAN? Huh? by ryanisflyboy · 2007-07-24 17:41 · Score: 1

Certain SAN technologies allow cross-site replication of data. That way if there is a major disaster, or the datacenter screws up and you end up with no power, you have an off-site replicated slave with your data. If done right it can be brought online in minutes (or even faster). There are many different vendor solutions for SAN replication. If you do it right you can use your replicated SAN for backups, and certain intensive read-only business processes. It also makes testing major production code changes easier, just make a new snap, export the volume to your stand-by test gear, and test away. When finished, undo it, and you're all good.

Master-slave database replication could be fine if all you need to replicate is databases. I think it would depend on your particular workflow - and how much data you had - on which one would be the better choice. Licensing could be a big issue as well when comparing the two choices. Data size is a big part of it. When you have hundreds of TB your choices start to get limited.

Here is a vendor link:
http://3par.com/products/remote_copy.php
Re:SAN? Huh? by walt-sjc · 2007-07-24 21:35 · Score: 1

Depends on the cost of being down, doesn't it?

Dark fiber, unless you are crossing the country, IS pretty cheap. In SF, there are about 784255 different fiber vendors, all with their own fiber in the street, which is why they are always digging everything up.
Re:SAN? Huh? by kasin · 2007-07-25 01:19 · Score: 1

Since when is 2x the storage and a high bandwith link b/w sites cheap? I didn't say cheap, I said relatively cheap. It's not much more than double the cost of your storage. If you don't have two lots of storage...how can you recover from a major failure quickly anyway? Get new equipment in? How long will take take to setup and restore from backup? Days?

Not sure what the upside is. by Kadin2048 · 2007-07-24 13:30 · Score: 1

Yeah I don't really get it. I'm sure that SF Bay is a nice place to work and all, probably a nice view, good selection of late-night delivery food ... but why the heck would you site a datacenter there? I get that it's a big Internet peering point, but still.

It's not like you need to walk down there and eyeball your server every day. Does it give the suits the warm fuzzies to be able to see their DC from their office window or something?

It's not *that* hard to get multiple backhauls from different backbone providers in other parts of the country, ones which aren't close to oceans, tectonic fault lines, and have cheap power. As far back as the mid 90s I remember that there were some fairly serious datacenters in Texas -- I think EDS set up the first really big ones.

Even the big East-Coast peering point (Reston, VA?) seems like it would be a better choice. Still uncomfortably close to an ocean and a major metro area, though.

--
"Ladies and gentlemen, my killbot features Lotus Notes and a machine gun. It is the finest available."

Re:Not sure what the upside is. by EvanED · 2007-07-24 14:39 · Score: 1

One of my friends was telling me that there is an enormous colo located in central NYC. This struck me as being pretty strange... after all, NYC real estate is through the roof, it's in the middle of the city, yadda yadda yadda. But turns out that it's at the end of (one of?) the transatlantic fibre cables, and is sitting plop on top of massively huge telecom switching stations that one article said half of US internet traffic flows through. So there are benefits for being in one central place. I don't know what San Francisco has to offer, but there could conceivably be something similar.

Sites affected by killermookie · 2007-07-24 13:31 · Score: 1

Sites affected include Technorati, Netflix (these display nice "We're Dead" pages), Typepad, LiveJournal, Sun.com, and Craigslist (these just time out).

And Ironport!

I get to rebuild some slave databases. Thanks 365! Your generators are top notch.

Sourceforge down? OH NO!!! by Spy+der+Mann · 2007-07-24 13:35 · Score: 1

Oh, wait, SF = San Francisco.

Whew.

Re:No Generators? by Anonymous Coward · 2007-07-24 13:37 · Score: 1, Insightful

Much of Europe uses 220V/50Hz.

Re:No Generators? by Sancho · 2007-07-24 13:43 · Score: 2, Insightful

The drunk thing is way outside the control of the administrators. Testing the failover is something they can do, and if something doesn't work, they can fix it.

I know who is responsible! by AlphaLop · 2007-07-24 13:45 · Score: 1

It was the Terrists (I wish I could type it the way Bush pronounces it) ;)

--
It's only paranoia if your wrong...

Re:No Generators? by dwater · 2007-07-24 13:46 · Score: 1

> The drunk thing is way outside the control of the administrators.

Eh? You're kidding, right?

--
Max.

Re:No Generators? by Sancho · 2007-07-24 13:54 · Score: 1

Nope. They're not talking some random drunk off of the street, they're talking about a disgruntled employee.

Re:No Generators? by dwater · 2007-07-24 13:58 · Score: 1

oh, ok. Even so, there's still things that can be done. I'd still be more pissed about that, I think. Like the other guy said, I can understand power failures - they're unacceptable, but still understandable. Allowing a drunk in there, whether an employee or not, I cannot understand. Is there only one person there or something? Security should have just sent the person home.

On the other hand, there's drunk, and there's drunk....

--
Max.

Conspiracy Theory by Jeremy_Bee · 2007-07-24 14:00 · Score: 1

Is there anyone reading this thread that actually knows something about power outages? The information about colos from those that use them and have visited this one is great, but am I alone in thinking the power outage itself was kinda weird (and perhaps even suspicious)?

I can't remember ever being in a power outage where the power went off for a few minutes, came back on, and continued going on and off for three hours. A typical power outage is a component failure that leads to a single outage or to an overload that then leads to an outage. Sometimes successive areas fall like dominoes as the overload travels around the grid, but isn't it unusual for power to go on and off like this?

I for one would appreciate it if anyone with actual knowledge of these things would post. I know this is more of a blue collar specialty than a tech one, but someone must know the answer. Seems to me that if the power was indeed going "on and off" for three hours that this is a very good reason why the colo might have failed. It's simply too unusual an occurrence to plan for.

Re:Conspiracy Theory by Burdell · 2007-07-24 14:07 · Score: 1

Not quite the same, but just last week I was over at my parents' house. We noticed the lights flickering (and the UPSes clicking) every 30-90 seconds or so for about half an hour, so we stuck a voltmeter in the socket. It was at 125V; when the lights flickered, it dropped to 120V for 30 seconds, the lights flickered again, and it went back to 125V. It did this for an hour or two before the power finally went out for an hour. The utility company had some type of failure at the substation that didn't die right away.
Re:Conspiracy Theory by Joe+The+Dragon · 2007-07-24 14:57 · Score: 1

They have Autoreclosers that trun off the power on and off a few times by them selfs.
http://en.wikipedia.org/wiki/Autorecloser

Also once a storm came though by my house and the power went off came back on after 1-5 minutes then went off aging after 10-20 minutes for 1-5 minutes after doing this a few times it went back off for 30-60 and stayed on for the rest of the night.
Re:Conspiracy Theory by seebs · 2007-07-24 15:03 · Score: 1

We've had outages like that, give or take; local failure, power doesn't make it, grid adapts, power comes back, something else blows, stuff like that.

--
My blog: http://www.seebs.net/log/ --- My iPhone/iPad app: http://www.seebs.net/seebsfrac/
Re:Conspiracy Theory by Alex+Zepeda · 2007-07-24 18:57 · Score: 1

http://www.sfgate.com/cgi-bin/article.cgi?f=/c/a/2 007/07/24/BAG9NR67253.DTL

"The problem began when breakers in the utility's transmission service opened for an unknown reason, Chiu said. Every time workers attempted to close those breakers to restore service, it caused voltage fluctuations -- high and low flows of electricity through the system -- that impacted PG&E's Martin Substation in Daly City, she said."

Basically, PG&E is run by beancounters.

--
The revolution will be mocked

Why 365 main went out: a second-hand speculation by LinuxParanoid · 2007-07-24 14:04 · Score: 1

An interesting possible reason for 365's outage debacle was posted by someone on an O'Reilly Radar blog (emphasis added by me):

ajblardone [07.24.07 06:22 PM] I was there when the power went out. The generators kicked in right away. Some colos were fine others weren't. Mine went black for a while after the outage. 365 main had been working on electrical upgrades all week and this outage might have been bad timing for them... At 4pm 365 main sent out a notice saying the building was 100% operational and still running on the generators until PG&E confirms that utility power is stable.

Re:No Generators? by Not+The+Real+Me · 2007-07-24 14:10 · Score: 3, Insightful

"...I would think these large sites are going to pitch a bitch..."

I would think these large sites would understand the concept of not putting all your eggs (servers) in one basket. There is a reason why smart companies use replication and clustering, and datacenters spread across the country.

Re:Here's the scoop... by Maserati · 2007-07-24 14:12 · Score: 1

The guy who rebooted that VAX is probably under a floor tile in that datacenter.

--
Veteran, Bermuda Triangle Expeditionary Force, 1992-1951

365 Main deletes press release about uptime by Animats · 2007-07-24 14:20 · Score: 2, Informative

The press release "RedEnvelope Reports Two Years of Continuous Uptime at 365 Main's San Francisco Data Center", which was on the 365 Main web site earlier today, has disappeared from there.

But they sent the press release to PR Newswire, and you can still read it there.

Re:No Generators? by MichaelSmith · 2007-07-24 14:33 · Score: 1

Yeah here in Australia it is 240V 50Hz, but often closer to 260 in Western Australia for historical reasons. Most people design for 250 which is what a volt meter will read in most places.

--
http://michaelsmith.id.au

Re:No Generators? by Technician · 2007-07-24 14:47 · Score: 2, Informative

They probably just didn't kick in. Had the same problem at Internap in Seattle a few years ago.

Many datacenters didn't expect the growth they experianced. As a result, many UPS and generator sets are undersize or the entire load is not onboard. In some cases, the critical serviers are up to post the we are down page, but the HVAC system and main floor are down. What good is having a datacenter up if the building AC is down? Sometimes you are forced to shut down simply because the support AC is down and not on critical power. You can ride out a 20 minute outage without AC, but after an hour, it's at critical tempratures.

--
The truth shall set you free!

Nobody has mentioned that... by Anonymous Coward · 2007-07-24 14:49 · Score: 2, Funny

365 Main gets to royally fuck up one day every 4 years. Maybe the companies should have hired 366 Main.

Re:No Generators? by Anonymous Coward · 2007-07-24 14:56 · Score: 1

How the fuck is it understandable? If the equipment is not being tested and maintained properly, how it is any more understandable than letting drunks through the main doors?

Properly maintained and tested backup equipment does not fail.

I swear, people are going through some major mental gymnastics here to excuse away sheer incompetence.

Re:No Generators? by darkpixel2k · 2007-07-24 14:57 · Score: 3, Funny

Time to upgrade the cardswipe system to also require a brethalyzer..

*swipe*
*bip* *beep* *beep* *boop* *bleep*
[deep breath]
*whoosh*

Alcohol Level: 0.15
*beeeeeeeep*

Damnit!!

--
There's no place like ::1 (I've completed my transition to IPv6)

Re:No Generators? by dwater · 2007-07-24 15:09 · Score: 1

Not a bad idea, I think...

--
Max.

Re:GameFAQs by totally+bogus+dude · 2007-07-24 15:18 · Score: 2, Funny

I'd prefer to think he was just trying to balance out all the people who have started using "site" when they mean "sight" since this whole intarweb thing came about.

You obviously don't live in SF by Alex+Zepeda · 2007-07-24 15:29 · Score: 1

PG&E is a good example of why "deregulation" does not work for utilities. We got about one thousandth of an inch of rain (barely measurable). This was just enough to knock out the power to a sizable chunk of the East Bay. Why? Because in their quest for profits, PG&E is too cheap to properly wash down their equipment, and dust builds up. A drizzle turns the dust into mud and causes stuff to short out. That's not to say that PG&E is good in the dry weather. Where I live in the Bay Area (decidedly not the sticks), power goes out for 3+ hours at least twice a year.

--
The revolution will be mocked

Bell uptimes by td · 2007-07-24 15:33 · Score: 1

Not so. The phone company's commitment to dial-tone reliability predates the existance of 911 service (which was first mandated in 1967 but not universally deployed until much later) by decades.

--
-Tom Duff

Re:No Generators? by Frosty+Piss · 2007-07-24 15:47 · Score: 1

Essentially, he thinks its OK.

Essentially, that's not what I said.

--
If you want news from today, you have to come back tomorrow.

want redunancy? try usenet by doom · 2007-07-24 16:08 · Score: 1

Whenever one of these web outages happens, I like to point out that this kind of stuff doesn't happen to usenet, because by design it uses a distributed back-end architecture, unlike this super-advanced web technology with central points of failure essentially built-in (unless you do a lot of fancy dancing to cover the problem).

You know, kind of like "P2P" and "bitorrent" and all that.

But of course, usenet lacks choke-points to insert advertising -- oh wait, I mean it lacks spam resistance, that's it -- so it is of course doomed to obscurity.

Re:No Generators? by Keruo · 2007-07-24 16:35 · Score: 1

Drunk employee?
Sounds like someone has been installing several instances of windows.
Like anyone would do that sober.. maybe their cable management was really bad and he just staggered into wrong corridor?

--
There are no atheists when recovering from tape backup.

Re:Here's the scoop... by (negative+video) · 2007-07-24 16:44 · Score: 1

Under the wide and heavy VAX
Dig my grave and let me relax
Long have I lived, and many my hacks
And I lay me down with a will.
These be the words that tell the way:
"Here he lies who piped 64K,
Brought down the machine for nearly a day,
And Rogue playing to an awful standstill.

Re:No Generators? by Johnno74 · 2007-07-24 16:47 · Score: 1

240v/50hz in NZ too.

Heh a couple of years back at home one night the power went out, then came on a few seconds later, but the lights seemed really dim... and the fridge was making a really weird noise, and other appliances were either not working or doing weird stuff.

I poked a multimeter into a wall socket, and we were only getting 90-110v. Surprisingly my computer and the tv were working fine.

Plenty of Irony to Go Around by cloudscout · 2007-07-24 17:05 · Score: 1

The press release that 365Main had on their site this morning about having two years of continuous uptime is now gone (after nearly every news article on this outage pointed it out).

Digging back a few months, I found another gem...

365 Main Recognized by PG&E...for taking proactive steps to reduce power usage.

This is what technical folk refer to as an "understatement".

Re:Plenty of Irony to Go Around by greenlead · 2007-07-24 21:23 · Score: 1

I went to the Sun.com website and what advertisement do they pick to show me? "Power Up and Go!".

Re:Here's the scoop... by Shadowruni · 2007-07-24 17:13 · Score: 1

I'd go to say they went and got his familiy too. I just forwarded that to my staff. I might have used to work for a data center for that green card. We had a guy get fired and manage to sholder surf his way in... all the way to the networking room. He then produced a pair of sheers and cut EVERY FIBER AND CAT5 in the room and then stab and break everything he could. There was no failover.... there was no time. Just as far as the billions of packets were concerned.... just a bright flash of light and then there was no more.

The global (yes, global) LB took over but we lost an estimated 100,000 credit card transactions in the 10 seconds it took for it to kick in. They literally had to hold restrain the manager.

Another one I had was at a telemaketing company. We had just run new wires for some T1s and my boss had these periods of "neatfreakness" he proceeded to cut the wires he THOUGHT were the old ones...

So as the call center manager is bragging to a potential client about our uptime all the TSRs are yelling into their headsets "Hello?". My boss comes out of the network closet with wirecutters in one hand and a bundle of cable in the other right as the manager and COULD HAVE BEEN client were walking up the aisle. Priceless....

--
"Chinese Amazons, power armor, laser swords.... things just meant to be." - Shampoo, A Very Scary Bet

Power back but not Craigslist by ukemike · 2007-07-24 17:23 · Score: 1

Supposedly the power has been back on for some time but craigslist is still down.
http://sfgate.com/cgi-bin/article.cgi?f=/c/a/2007/ 07/24/BAG9NR67253.DTL

Some of the sites, including Craigslist, remained down even after power was restored, as administrators ensured that data in the server hadn't been damaged, among other checks.

from the article... "Some of the sites, including Craigslist, remained down even after power was restored, as administrators ensured that data in the server hadn't been damaged, among other checks."

It's well after 10pm and craigslist is still only intermittently working. I wonder why they're having such trouble?

--
-- QED

Re:Power back but not Craigslist by NynexNinja · 2007-07-24 19:03 · Score: 2, Interesting

I would say incompetence... Craigslist has been plauged by incompetence since they started and small problems turn into big problems and make their site completely unusable. Their decision to use ambiguous messages like "This posting has been Published" in their anti-spam fight has made their system unreliable. One only has to take a look at the help forum for indication that their admins really do not care about the reliability of the system and questions about the constant downtime and unreliable nature of the postings are answered with vague condescending responses from staff members. Postings say they are Published and in fact they never show up on the site. This has been going on for months now with no end in sight. I would say they need a few good systems engineers to fix what's going on, however, you would almost conclude that they enjoy and even relish the moments when their site is completely unreliable or offline for days at a time. It makes one wish of a day when a competent site with competent administration would come along to replace this type of environment.
Re:Power back but not Craigslist by Master+of+Transhuman · 2007-07-25 07:48 · Score: 2, Informative

Absolutely correct.

I posted an ad the day BEFORE the outage and it never showed up on the site, nor in search.

On their status page (before the outage), they acknowledged they had problems and were promising to fix them sometime "before fall". Really competent...not.

If you have problems with your ad being pulled at random by idiots flagging it for lame excuses like all caps headlines (the rules say AVOID all caps, not "we will pull your ad for it"), the only recourse you have is to get sent to the help forum, where 16-year-old assholes throw insults at your ad.

Your competitors can flag your ad all day and there's nothing you can do about it because the Craigslist staff have insulated themselves from responsibility by claiming it's a "community-run" operation.

Pathetically badly run outfit.

--
Richard Steven Hack - This sig is TOO GODDAMN SHORT TO DO ANYTHING USEFUL WITH! MORONS!

Re:Gross malfeasance by suresk · 2007-07-24 17:26 · Score: 2, Insightful

Now, now... LiveJournal is back up.

What really happened: by Just+Some+Guy · 2007-07-24 17:33 · Score: 1

CFO: "Four nines are 80% as good, right?"

--
Dewey, what part of this looks like authorities should be involved?

Re:No Generators? by gujo-odori · 2007-07-24 17:35 · Score: 2, Insightful

Have you ever been in a data center? Cabinets that are all locked. To get the key, you have to sign it out from security. Ditto for the cages. It wouldn't just require a drunken/disgruntled employee, it would require a conspiracy of them: security staff to hand over the keys and the disgruntled employees to do the misdeeds.

Well, there is one way around that: you walk over to the EPO button and give it a whack. It'll take down the whole floor. Rinse, lather, repeat on other floors. How many do you think you can do before someone stops you?

Anyway, my employer has a lot of stuff in 365 Main. We're not one of the companies mentioned in TFA, but we're certainly one of the ones affected. Within a couple minutes of the outage, we knew we'd lost everything we had there and several of our sysadmins grabbed their gear and headed for the city to go join that line outside of 365. By the time they left the building we had confirmation that it was a power outage.

Power was already back on when they got inside and they immediately brought up anything that wasn't already up and tested it all to make sure it was OK. To say the least, this is inconsistent with (tall) tales of somebody going apeshit on 40 racks.

Re:No Generators? by Nullav · 2007-07-24 17:42 · Score: 2, Funny

Who 'drunk tests' a data center?

--
I just read Slashdot for the articles.

Re:July 24th: RedEnvelope Press Release by 365 Mai by mad_psych0 · 2007-07-24 17:47 · Score: 1

Even better is the page is now 404'd =)

Re:Here's the scoop... by Maserati · 2007-07-24 17:55 · Score: 1

Oh, telecom for telemarketers. I don't even put telecom We must have been Very Bad in a past life to have done that. I was escalating Nortel service issues to VPs in Texas (from the Bay Area) at mine; they eventually reassigned the guy who slept in his truck, never bathed or brushed (teeth or hair) and was usually drunk. Musta been one hell of a union. That doesn't even touch the people I worked for. Ugh.

Oooh, I bet in about 3 more years that manager torches the shoulder surfer's car in the middle of the night. Just for that the LART is moving from the serve room to my office. It's only a 3 wood, but the very first time I met our new CFO I told him he would eventually let me expense a taser. We're getting closer.

--
Veteran, Bermuda Triangle Expeditionary Force, 1992-1951

Internap...again by xrayspx · 2007-07-24 17:58 · Score: 1

How does Internap keep doing this? The major Seattle problem, yeah, but I can recall several outages (of LJ mainly) where they say "our provider lost power due to whatever and their generators didn't work/were overloaded/worked, but then stopped". I've been in their Boston facility, and it was packed to the gills, and there were large generators outside. I'd have to assume they work.

--
I like music

Re:July 24th: RedEnvelope Press Release by 365 Mai by gujo-odori · 2007-07-24 18:01 · Score: 1

Where I used to work, we had a commodity K62-300 box running Solaris x86 go for over three years on an unfirewalled global IP, acting as a DNS server for an ISP. In the end, it was brought down by the power supply fan seizing. It was so type I couldn't even turn it by sticking a screwdriver in the blades.

Clearly, I'm hung like an eHorse :)

You must be new ... there .. by freaker_TuC · 2007-07-24 18:18 · Score: 1

You should already be steamed up for that!

--
--- I am known for the ones who want to find me on the net. Is that a privacy risk or a privilege? One might wonder..

Re:No Generators? by gregleimbeck · 2007-07-24 18:38 · Score: 1

Shouldn't the drunk be taken into account in a failover plan? Granted, it is unlikely, but the whole point of a failover is making sure that if one server is not available (IE motherboard failed, died in a fire, or pissed on by a drunken soon to be ex-employee) that you FAIL OVER to the other server or server farm.

--

P.S.,

This is what part of the alphabet would look like if Q and R were eliminated.

Re:No Generators? by dwater · 2007-07-24 18:43 · Score: 1

Sounds like it's time for a new technical term...

--
Max.

Re:No Generators? by Toutatis · 2007-07-24 18:46 · Score: 1

Yes, they had generators, but run on alcohol and someone drunk it.

The word directly from 365 by Meridian+Umbrios · 2007-07-24 20:46 · Score: 4, Informative

Here is the e-mail that 365 is sending out to their customers. The best is their tagline "the world's finest datacenters'.

365 Main Customer,

At 1:49 p.m. on Tuesday, July 24, 365 Main's San Francisco data center was effected by a power surge caused when a PG&E transformer failed in a manhole under 560 Mission St.

An initial investigation has revealed that certain 365 Main back-up generators did not start when the initial power surge hit the building. On-site facility engineers responded and manually started effected generators allowing stable power to be restored at approximately 2:34 p.m. across the entire facility.

As a result of the incident, continuous power was interrupted for up to 45 mins for certain customers. We're certain colo rooms 1, 3 and 4 were directly affected, though other colocation rooms are still being investigated. We are currently working with Hitec, Valley Power Systems, Cupertino Electric and PG&E to further investigate the incident and determine the root cause.

All generators will continue to operate on diesel until the root cause of the event has been identified and corrected. Generators are currently fueled with over 4 days of fuel and additional fuel has already been ordered.

We understand the seriousness of this issue and will provide full details once they come available. We sincerely apologize for the impact this has had on your operations.

Regards,
Vice President, Security
365 Main
"The World's Finest Data Centers"
Just send me a big fat check and all is forgiven.

Re:The word directly from 365 by shish · 2007-07-25 02:52 · Score: 1

data center was effected affected. Effect is the thing that happens, affect is what it happens to.

--
I mod down anyone who says "I will be modded down for this", regardless of the rest of their comment
Re:The word directly from 365 by AK+Marc · 2007-07-25 05:39 · Score: 2, Funny

On-site facility engineers responded and manually started effected generators allowing stable power to be restored at approximately 2:34 p.m. across the entire facility.

Wow, on-site engineers took 45 minutes just to be able to turn on generators? The generator for our facility has a master switch and a big green button. I think a monkey could get it running in 20 seconds by slinging poo at it. So, what other problems did they have that they aren't telling us? Someone else mentioned a flywheel system. Did that fail so that the generators wouldn't start without mains power, and it took 45 minutes for the mains to come up to where they could draw from that to start the generator? "Our backup generator works, but only if the city power is working."

--
Learn to love Alaska
Re:The word directly from 365 by MushMouth · 2007-07-25 08:57 · Score: 1

Not only that, according to one of my server's /var/log/messages, they were power cycled at least 4 times in that 50 minutes. You would think they would have killed the breakers for the rooms that were dark until they got their generators running to reduce the surges.

Sadly 365 Main has been better than United Layer at 7th and Mission was.

Re:No Generators? by PhireN · 2007-07-24 20:52 · Score: 1

You do know most multimeters are not rated for the amps and peek voltages of mains power. The same thing happened to me, also in New Zealand, late at night about one and a half years ago. The power cut out for a second, the computer died then started back up, but never started booting. The lights were real dim, and my alarm clock started showing weird digits. The washing machine (which was off) managed to get its self halfway into a wash cycle and started beeping. I never checked the TV, but the computer screen was. You wouldn't happen to live near Springston?

Re:No Generators? by FireFury03 · 2007-07-24 20:54 · Score: 1

I poked a multimeter into a wall socket, and we were only getting 90-110v. Surprisingly my computer and the tv were working fine.

TVs and computers use switched mode power supplies these days, which are quite happy running on a fairly wide range of voltages (although the current draw will of course be much higher at lower voltages). This is the reason why PSUs nolonger have 110/220v switches on them.

--
http://blog.nexusuk.org

Sheepshagger Intel by Dogtanian · 2007-07-24 21:10 · Score: 2, Funny

I don't follow your math. Did you do it with an Intel chip, by any chance? Poor Intel (boo hoo!), they messed up 13 years ago and people are still making jokes about it. Reminds me of the old joke (stolen from here):

A man goes into a pub in a small town and, for whatever reason, gets introduced to the clientele. There's Farmer Jack, Barman Jim, Maurice "Dancer" and Sheepshagger John. After a few pints, the visitor's curiosity gets the better of him and he asks John what's with the nickname.

"See this pub?" asks John, "I built it, but they don't call me Pubbuilder John? I'm the local doctor, I saved Barman Jim's life once when he choked on a peanut, but they don't call me Lifesaver John. Every year, I supply a huge Christmas tree for the village green, but the don't call me Christmas Tree John.

"But you shag one lousy sheep..." (Note; since that Austin Powers film came out, I assume that you Yanks know what "shagging" is now).

--
"Slashdot - News and Chat Sites Deviant". (Click "homepage" link above for details).

Re:Here's the scoop... by Shadowruni · 2007-07-24 22:00 · Score: 1

Taser. Hell, where I am we've got shipyard justice. Plasma cutters do wonders....

--
"Chinese Amazons, power armor, laser swords.... things just meant to be." - Shampoo, A Very Scary Bet

Re:GameFAQs by Dogtanian · 2007-07-24 22:04 · Score: 2

Trust me, it's not that bad. The guy made one spelling mistake in a post that was otherwise correctly spelled, punctuated, capitalised, and generally better-written than a lot of the crap that's out there on the Net.

--
"Slashdot - News and Chat Sites Deviant". (Click "homepage" link above for details).

Re:GameFAQs by MarkoNo5 · 2007-07-24 23:25 · Score: 1

> Main sight is back up as of now, though forums are still down.

I see.

365 took it down, but PRNewswire has it. by celerityfm · 2007-07-24 23:42 · Score: 1

Looks like they couldn't stop the story before it hit the wires... I wonder if they'll issue a retraction? :P

--
...unfortunately no one can be told what The Mat^H^H^HGoatse is...they must experience it for themselves...

Re:365 took it down, but PRNewswire has it. by celerityfm · 2007-07-24 23:46 · Score: 1

And here is the link.

--
...unfortunately no one can be told what The Mat^H^H^HGoatse is...they must experience it for themselves...

Quick refill contracts vs. natural gas & LP co by swb · 2007-07-25 01:26 · Score: 1

In the event of a significant regional disaster, how good are those quick-refill contracts, anyway? I just keep thinking of fuel diversion by emergency services or simply the inability to deliver fuel due to transit woes, employee shortages, etc. In the event of a significant situation, fuel is a top priority for lots of people, many of them with guns and/or legal authority to seize it.

It always struck me as somewhat more resilient to have N+1 generators capable of being run on natural gas and LP. Sure, some regions might lose natural gas delivery (earthquake, etc), but it seems more likely to me that natural gas would keep running in spite of problems that might prevent or badly slow diesel delivery. And being capable of switching to LP means that even if you lose natural gas, you can keep running on on-site fuel.

The downside is that you probably are more limited in LP storage facilities in dense urban areas (diesel seems more fire marshal friendly) and diesel is more fuel efficient, but overall it seems that the odds favor longer term survivability of natural gas + LP vs. diesel.

Nothing like being prepared! by HitekHobo · 2007-07-25 02:56 · Score: 1

Working for a telephone company in Florida, I have a hard time believing anyone running a data center could be so ill-prepared. We have our own issues with DR - there's going to be some issues when a bomb goes off under a switch site; BUT we have had multiple switch sites keep running simultaneously on generators and inverters during and after hurricanes. Our NOC and switch techs go above and beyond to keep power and connectivity up. They may get a bug out notice prior to a major hurricane, but if so, everything is cut over to generator power with at least 48 hours of fuel and they're back on site just as soon as the roads are drivable. The last time South Florida got smashed, all of my data systems stayed online even though it was close to a week before commercial power was back on.

--
A couple of 30-somethings embark on the ultimate roadtrip

diesel did not start by XHIIHIIHX · 2007-07-25 03:00 · Score: 1

Friend that was at the colo says the diesel did not start and the wheel spun down.

CFO has trouble with 5th grade by Iberian · 2007-07-25 03:08 · Score: 1

On-site facility engineers responded and manually started effected generators allowing stable power to be restored at approximately 2:34 p.m. across the entire facility.

If a CFO with at least an MBA cannot make proper use of grammar in an apology letter sent to his paying clients what hope is there. Seriously that is inexcusable next time maybe try proofreading or perhaps have a secretary review it...

planning by JTrock · 2007-07-25 03:37 · Score: 1

Didn't 365 ever plan for their own disaster? I'm sure other major companies have enough redundancy in their infrastructure to support their business in case of a power outage. Found this article on disaster planning: http://www.smartbrief.com/news/aaaa/industryBW-det ail.jsp?id=B3A11DDD-AD9B-4399-9682-6E54C82E6757 More companies need to prepare for when it's their turn instead of relying on someone else. What about data recovery? What if the drives got damaged? They'd be spending a whole lotta money on data recovery.

Even weirder things happened that day by stacey7165 · 2007-07-25 03:51 · Score: 1

I work for Hyperic, which is a systems management company here at 2nd & Mission in SF. Our website is run out of that colo on 365 Main... and it was up all day yesterday, even despite the manhole cover blasting off which resulted in the mad power outage... which was witnessed by a Fruit of the Loom commercial and an apparent dead guy that had been on a gurney in the street with a security guard for a couple hours. (for more on this with pictures, check out Javier's blog from yesterday: http://www.hyperic.com/blog/hyperic/2007/07/24/hyp eric-is-where-the-action-is/ Knowing a thing or two about running data centers here, that data center definitely has serious backup and disaster recovery - they are professionals otherwise we wouldn't have picked them and neither would other serious businesses like Yelp and Technorati. I don't know for sure - but the drunken idiot theory makes a whole lot more sense given how many other sites that I know are run out of there that were unaffected, ours included. -Stacey Schneider Hyperic http://www.hyperic.com/

Millions were paged, and cried out in despair by wsanders · 2007-07-25 04:01 · Score: 2, Interesting

Waiting in line for checkin at 365 Main:

http://tastic.brillig.org/~jwb/dorks.jpg

--
Give a man a fish and you have fed him for today. Teach a man to fish, and he'll say "WHERE'S MY FISH, YOU IDIOT?"

Netflix outage was not related by wsanders · 2007-07-25 04:15 · Score: 1

Netflix is hosted elsewhere. Their outage was not related to the power problem.

http://cbs5.com/topstories/topstories_story_206063 640.html

--
Give a man a fish and you have fed him for today. Teach a man to fish, and he'll say "WHERE'S MY FISH, YOU IDIOT?"

Re:No Generators? by spamking · 2007-07-25 04:16 · Score: 1

Who 'drunk tests' a data center?

Sounds like it's time to update the Hazard Vulnerability Analysis . . .

Re:No Generators? by Sandbags · 2007-07-25 04:20 · Score: 2, Informative

Well, they DO test this regularly, at least generator fail over in the event of power loss. Unfortunately, it appears that a significant power SURGE occurred from a transformer back feed. This resulted in the flywheels in their generators spinning down before power could be switched over and likely some system that detects power loss probably got fried in the surge and never notified the generator controller of the loss. 1) they're lucky they have a REALLY good ground fault interrupter as this likely would have cooked every server in every rack otherwise, or at least every surge stopgap between the line feed and the racks, which still could have caused days of downtime to replace, 2) how does one test for a several megawatt power surge? 3) Only some of their racks went down so at least some battery or generator power came online, just not all of them, or not ones that powered certain rooms.

That said, the fact that they're running exclusively on generator until they identify and fix this fault, and that the power company and the generator operators are jumping in means they're more than willing to blow several thousand in fuel costs to make sure this does not happen again, and I would expect they'll bill the generator manufacturer for this failure and all related costs (which that company will likely bill to an insurance provider) and possibly find another generator company or add a few more redundant systems.

The fact that the clients are not insisting on installing UPS systems with at least 30 minute run times IN the racks with their servers means either the clients are cheap, or no one considered that a fuse, breaker, or PDU in a rack could blow and take out half a rack or more if it wasn't on internal UPS power, regardless of whether power was on or not... This is flawed redundancy thinking.

Business Continuity should be 25% of total IT spending (labor, hardware & software, backup, everything combined). This does not include redundant co-lo for users, only servers. If you want redundancy for everyone, users included, take you IT budget now (without that redundancy) and add 125% to it (it costs MORE than double).

--
There is no contest in life for which the unprepared have the advantage.

Re:July 24th: RedEnvelope Press Release by 365 Mai by Hemogoblin · 2007-07-25 04:21 · Score: 1

Heres the text from PR Newswire

RedEnvelope Reports Two Years of Continuous Uptime at 365 Main's San Francisco Data Center

Online Retailer Also Cuts Energy Costs by 33 Percent

SAN FRANCISCO, July 24 /PRNewswire/ -- 365 Main Inc., developer and
operator of the world's finest data centers, has provided online retailer
RedEnvelope with two years of 100-percent uptime at 365 Main's San
Francisco facility. It has also reduced RedEnvelope's overall energy costs
by 33 percent during this period.
Since moving to 365 Main in July 2005, RedEnvelope has more than
doubled its data center footprint and has scaled the number of hits it
processes per second from 800 to 1700. The site, which specializes in
all-occasion gift- giving, has also closed its redundant data center in the
Midwest and moved its excess capacity to 365 Main, tripled its systems
capacity and its storage capacity, and converted its point-to-point
networks to an MPLS backbone.
Saving energy and improving efficiency
RedEnvelope's energy costs have been reduced by a third since moving to
365 Main, a savings the company directly attributes to a unique billing
system in which 365 Main only charges customers for the exact amount of
power that is used. Most data center companies charge a flat rate per month
in a "use it or lose it" structure.
In addition to helping customers reduce energy costs, 365 Main is
taking strides to make its own business more efficient. In May the company
became the first data center developer and operator to commit to full
compliance with the building certification system put forth by the U.S.
Green Building Council (USGBC), a non-profit organization of leaders from
every sector of the building industry. Earlier this year 365 Main also
joined The Green Grid, a global nonprofit consortium of technology
companies and professionals dedicated to advancing energy efficiency in
data centers and business computing ecosystems.
"Two years ago we decided to move our servers to 365 Main because we
believed its San Francisco facility could accommodate our expected growth
better than any other, and for a comparable price," said Dale Emel,
director of technology services at RedEnvelope. "And that's exactly what
has happened. 365 Main has fulfilled its brand promise of 'world's finest
data centers' by delivering the reliability and uptime that attracted us in
the first place."
To ensure uptime for key tenants such as RedEnvelope, 365 Main provides
modern power and cooling infrastructure. The company's San Francisco
facility includes two complete back-up systems for electrical power to
protect against a power loss. In the unlikely event of a cut to a primary
power feed, the state-of-the-art electrical system instantly switches to
live back-up generators, avoiding costly downtime for tenants and keeping
the data center continuously running.
"RedEnvelope is a high-profile, well respected e-commerce brand and one
of the most popular gift-giving sites in the world," said Chris Dolan, CEO
of 365 Main, "It has succeeded in an industry where countless others have
failed, and we are extremely proud to have provided it with a home to grow
its business."
About RedEnvelope Inc.
RedEnvelope Inc. is a retailer dedicated to inspiring people to
celebrate their relationships through giving. RedEnvelope offers an
extensive collection of imaginative gifts through its webstore,

Re:No Generators? by pragma_x · 2007-07-25 04:22 · Score: 1

I like "FALL-over" planning myself. Ya know, for when an admin shows up for work stumbling drunk.

Re:Gross malfeasance by Nick+of+NSTime · 2007-07-25 07:09 · Score: 1

He's making fun of you. Your melodramatic whining is well-suited to the types of posts made on LiveJournal every day.

Re:Gross malfeasance by The+Man · 2007-07-25 07:16 · Score: 1

It's not whining, it's a demand for accountability. If I were as bad at my job as these people are at theirs, I would be fired, and rightly so. Those of you who like to bend over and take it are welcome to keep doing so. I won't.

Re:No Generators? by Master+of+Transhuman · 2007-07-25 07:43 · Score: 1

"I swear, people are going through some major mental gymnastics here to excuse away sheer incompetence."

They have sympathy because they know they're incompetent, too. About 95% of the population would agree.

That said, the cause of the outage could be another case of "nothing works and nobody cares." In other words, they did test everything, but it still failed when push came to shove.

I'm trying to install an eSATA for a client. We've gotten three eSATA controllers, two sets of port multipliers, and different cables and it still doesn't work right. The last set was actually tested by the company before sending it to us and it still doesn't work right, although it came a lot closer than the first two. Now the company admits the port multipliers are a new design by their supplier and they can't be certain it isn't a quality control problem with their supplier.

Nothing works in IT. I just have to engrave that on my hand like the punishment in the latest Harry Potter movie.

--
Richard Steven Hack - This sig is TOO GODDAMN SHORT TO DO ANYTHING USEFUL WITH! MORONS!

Re:No Generators? by MushMouth · 2007-07-25 08:36 · Score: 1

UPS's themselves may be cheap, but they eat rack space. So when you sign a contract with a provider, you take into account all of these costs and perceived benefits.

earthquakes by modemuser · 2007-07-25 10:35 · Score: 1

Am I the only one to think of the possible consequences of a major earthquake in the bay area?

Hosting a site at only one location could, in the worst case, lead to the site being lost, if it was not backed up at an other physical location.

Same story, different month. by punkrockgeekboy · 2007-07-25 12:59 · Score: 1

There are no surprises here. San Francisco's Mission St. Substation feeds half a dozen significant datacenters (365main, Level3, Coloserve, 400 Mission, and 650 Townend) and has suffered 3 serious outages in the past 7 years. California itself had 2 straight summers of rolling blackouts, which only subsided thanks to the dot-com crash. California is running out of duct-tape. 365main, usually runs a good operation, and is one of the best datacenters in California.. However, it's also the most expensive datacenter in California, and should have a better track record than it's lower-cost competitors like 200 Paul and Coloserv. In May, 2007 we moved our infrastructure out of 365, off of California's cancerous power grid, and into a more reliable, greener, and cheaper grid.. Yeah, we moved to Seattle. This was the best decision we ever made. Most of our experience with 365 was extremely positive, however pricing, and power density problems forced us to move. I can't list all of the good things 365main did, but here's a list of 365's power problems as we experienced them: In April, 2005 365main had an outage that affected all customers for 50 minutes due to a failed EPO valve. 365 handled that outage spectacularly, claling all of their customers within 15 minutes of the outage. In February, 2006 365main experienced a partial outage for 3 seconds that only affected some customers, but caused problems in their Telco spine, affecting connectivity. In October, 2006 365main had a backup generator fail, but supposedly no customers were directly affected, but customers were not allowed to enter the building between 3:29 PM and 4:40 PM.

Re:No Generators? by Guspaz · 2007-07-25 14:50 · Score: 1

You don't need to be a large site to spread yourself out. Even if you're just big enough to be able to afford $100-200 USD per month in hosting costs, you can do at least reasonably effective redundancy...

Roundrobin between the two servers means that in the event of an outage, only 50% of requests are denied, and you can change the DNS records (with a low TTL, I'd hope) and be switched over entirely to the surviving server within minutes. And that's just with two cheap budget dedicated servers in commodity datacenters...

I mean, throw one box up at ThePlanet in Dallas, another one up at iWeb in Montreal (Hah! Multi-country redundancy) and you've got yourself a pretty darned good chance of surviving ANY disaster. I mean, Quebec (Montreal) and Texas (Dallas) both have their own interconnects too, so one of those giant power outages that took out the eastern US/Canada a few years back (Except Quebec) wouldn't even affect both locations.

But I know very little about this sort of thing. So maybe my idea of zero-budget failover with DNS is stupid, somebody fill me in.

Re:No Generators? by Johnno74 · 2007-07-25 16:11 · Score: 1

Yeah I figured that. I was surprised about the TV tho...

Re:No Generators? by Johnno74 · 2007-07-25 16:15 · Score: 1

Nope this was in Wellington.

My multimeter, while its a cheap one, can measure up to 500V AC. And a voltmeter has basically infinite resistance - as close to zero amps as they can make it will flow through it. This is why you wire them in parallel in a circuit. An ampmeter has as little resistance as possible, and is wired in series. If you plug a multimeter into a wall socket when it is switched to amps then you are going to need a new ampmeter. Its basically the same as pushing a bent-up paperclip into the phase/earth sockets.

Re:No Generators? by Sandbags · 2007-07-27 06:37 · Score: 1

UPS are typically a 2U or 4U solution and can power up to half a rack. Data centers almost always charge less for U's associated with devices that don't add to power or heating costs (KVM, terminal display, tape jukeboxes, etc). Rack space itself is cheap if the costs of cooling and power can be eliminated. A company we partner with (we back up their racks, not use their space) charge a base price for the 1st U per server, plus a lower price per additional U (for the same server). Buying a 1/4 rack, half rack, or full rack is always at a discount compared to U-by-U purchase. Any reasonable configuration is going to be at least 3 individual servers anyway, so an extra 2U for the UPS divided out to a few servers should not be worth considering when looking at the cost of power loss. Some data centers even include this additional UPS space (and the UPS itself) as part of the regular fees.

--
There is no contest in life for which the unprepared have the advantage.

Update from 365 Main - multiple Hitec failures by Animats · 2007-07-29 06:10 · Score: 1

365 Main has placed a statement about their Hitec UPS failures on their web site. Highlights:

Generator 1 detected a problem in its start sequence and shut itself down within 8-10 seconds.
After initial failure, Generator 1 attempted to pass its 732 kW load to Back-up 1, which also detected a problem in its start sequence.
After Generator 1 and Back-up 1 failed to carry the 732 kW, the load was transferred to Back-up 2 which correctly accepted the load as designed.
Generator 3 started up and ran for 30 seconds before it too detected a problem in the start sequence and passed an additional 780 kW to Back-up 2 as designed.
Generator 4 started up and ran for 2 seconds before detecting a problem in the start sequence, passing its 900 kW load on to Back-up 2. This 900kW brought the total load on Back-up 2 to over 2.4 MW, ultimately overloading the 2.1 MW Back-up 2 unit, causing it to fail. Generator 4 was manually started and brought back into operations at 2:22 p.m. Generator 4 was switched to utility operations at 7:05 a.m. on 7/25 to address an exhaust leak...
Generators 2, 5, 6, 7 and 8 all operated as designed and carried their respective loads appropriately.

So they apparently had startup failures in four out of ten Hitec units. Their basic architecture is that they have eight main UPS systems (each is a motor/generator/flywheel/Diesel combo), each driving a separate section of the colo, and two spares, Backup 1 and Backup 2, which can be switched to drive any section. No big battery banks; it's all flywheels and Diesels. At least eight systems must be running to keep the full data center up.

365 Main has had Hitec experts flown in from Holland, where the UPS was made. Today, Hitec top management arrived: "A longstanding member of the Hitec Board of Directors is arriving later tonight and will be onsite tomorrow (Sunday) to participate in all investigation activities."

Slashdot Mirror

Multiple Sites Down In SF Power Outage

338 of 423 comments (clear)