HP Server Killer Firmware Update On the Loose

← Back to Stories (view on slashdot.org)

HP Server Killer Firmware Update On the Loose

Posted by timothy on Friday April 25, 2014 @12:43AM from the just-this-one-little-problem dept.

OffTheLip (636691) writes "According to a Customer Advisory released by HP and reported on by the Channel Register website, a recently released firmware update for the ubiquitous HP Proliant server line could disable the network capability of affected systems. Broadcom NICs in G2-G7 servers are identified as potentially vulnerable. The release date for the firmware was April 18 so expect the number of systems affected to go up. HP has not released the number of systems vulnerable to the update."

24 of 100 comments (clear)

Min score:

Reason:

Sort:

Hot firmware by DigiShaman · 2014-04-25 00:59 · Score: 4, Interesting

And this is why I wait at least 2 months before installing firmware updates (unless it's a major security issue). It's not uncommon for a firmware update to be pulled shortly after being published. The 2 month window delay is generally more then enough time to ensure it's a proper update is solid.

--
Life is not for the lazy.
1. Re:Hot firmware by SJHillman · 2014-04-25 01:57 · Score: 4, Funny
  
  The company budget committee helps us avoid issues like this. The majority of our gear is old enough that there hasn't been a firmware update in four or five years. And no plans to replace it any time soon.
2. Re:Hot firmware by Lumpy · 2014-04-25 03:48 · Score: 2
  
  I wait until I HAVE TO install a firmware update. Unless there is a major problem that will cause the server to explode, you dont update the firmware.
  
  --
  Do not look at laser with remaining good eye.
3. Re:Hot firmware by SJHillman · 2014-04-25 06:50 · Score: 3, Funny
  
  They approved two rolls of duct tape for this fiscal year, so we feel well prepared.
If it ain't broke... by Rich0 · 2014-04-25 01:01 · Score: 4, Insightful

...don't flash it.
Do admins routinely flash firmware updates in the absence of some identified need? I could see flashing an update if I was suffering from a known problem, or if the vendor identified a security flaw in a previous release. I could see flashing it if necessary to install new hardware.
I just don't see why a server admin would flash a firmware update as if it were Patch Tuesday. In the absence of a security vulnerability or production issue there is no reason to treat a firmware change as an expedited change and not perform full testing before deploying it. That isn't to say that doing some testing of security patches/etc isn't wise - but I can see why it would get rushed.
1. Re:If it ain't broke... by Charliemopps · 2014-04-25 01:09 · Score: 2
  
  and when your entire site goes down on a Monday morning because one of your vendors applied an update to some connecting hardware? And their response when asked for the reason for the outage is "Your hardware was 3yrs out of date. Your Sys Admin said it wasn't broke so he didn't fix it" What's your boss going to say after he gets done telling how many years of your salary the outage cost the company?
  I delay updates, but I get that shit approved by executive officers first. I always make sure I have a very good reason to delay it as well.
2. Re:If it ain't broke... by bravecanadian · 2014-04-25 01:12 · Score: 2
  
  and when your entire site goes down on a Monday morning because one of your vendors applied an update to some connecting hardware? And their response when asked for the reason for the outage is "Your hardware was 3yrs out of date. Your Sys Admin said it wasn't broke so he didn't fix it" What's your boss going to say after he gets done telling how many years of your salary the outage cost the company?
  I delay updates, but I get that shit approved by executive officers first. I always make sure I have a very good reason to delay it as well.
  Ah yes, Sys Admin, the damned if you do and damned if you don't profession.
3. Re:If it ain't broke... by MrNemesis · 2014-04-25 01:22 · Score: 5, Informative
  
  TBH, I suspect this is just getting publicity since it's the first super-dodgy HP firmware patch since they adopted their "no updates for YOU!" mentality - the explanation for which from HP was that they'd sunk a lot of money into their patching process and people shouldn't get to use it for free I guess. This won't be the last time this happens either.
  As a sysadmin that's dealt with dozens of these "killer firmwares", there's often an indentified need. We make extensive use of the HP SPP's at work and they come with a list of fixes and known issues as long as your arm; it's part of my job to go through the advisories to see if we're at risk and if we are to analyse the risk of updating/not updating. Many of them aren't security vulns or emergency fixes and are often extremely obscure, but once in a while you'll encounter something like a NIC locking up on receiving a certain type of packet or the BIOS repeatedly saying a DIMM has failed when it hasn't, or if you mix hard drives with firmware X and firmware Y on RAID controller Z running firmware... er.. A it might drop the whole array... lots of little issues than can severely impact running systems if left unchecked. And then when you upgrade one component you'll frequently have to upgrade others to stay within the compatibility support matrix, until eventually you just run the damned SPP to make sure everything in that server is at a "known good compatible" level.
  Sure, we don't just flash as if it were patch tuesday and no-one ever should - we wait for at least 2 months of testing on non-production boxes before we patch any prod kit with firmware unless it's an emergency fix - but lots of people use the HP SPP to automatically download the latest updates; we've had enough problems with them that we'd never do this (and in any case 97% of our servers have no net access). But the whole point of the SPP is meant to be that HP should have already done most of the regression testing for you.
  That said, we've had nothing but trouble with Broadcom NICs for ages and I'm sure there's many admins here who have fond memories of the G6 blades, broadcom NICs, ESX and virtual connect from a few years back. Think HP switched much of their kit to Emulex after that debacle. Also, the latest web-based HP SPP (as opposed to the last one where you just ran a binary) is a complete train wreck on windows for ad-hoc updates, largely due to the interface being handed over to people who seemed to want to make it a User eXperience rather than a tool.
  
  --
  Moderation Total: -1 Troll, +3 Goat
4. Re:If it ain't broke... by SJHillman · 2014-04-25 01:59 · Score: 2
  
  This is why I believe in half-assing everything. If it's only half done, you can only be half damned either way. Right?
Isn’t this one limited anyway? by Movi · 2014-04-25 01:09 · Score: 3, Funny

Aren’t they also the ones who limit their firmware updates only to customers who have support contracts? I guess you get what you pay for..
Well at least it only affects paying customers ;) by Hohlraum · 2014-04-25 01:14 · Score: 3, Funny

http://slashdot.org/story/14/02/05/0258244/hp-to-charge-for-service-packs-and-firmware-for-out-of-warranty-customers
i got hit by this by alen · 2014-04-25 01:21 · Score: 3, Interesting

didn't brick my server but it screwed up the device list in Windows and caused a cluster not to see the one node where i upgraded the drivers/firmware. put a null device driver into the device manager and i had to delete it and all was OK. just $250 to MS to figure this out since didn't think it was a HP issue
on the server the network worked and all but the NIC's weren't "seen" by Windows and so the clustering was screwed up
1. Re:i got hit by this by TechyImmigrant · 2014-04-25 06:05 · Score: 2
  
  You're running Windows on a server?
  The world is a stranger place than one can imagine.
  
  --
  I should use this sig to advertise my book ISBN-13 : 978-1501515132.
Oh, HP... by Greyfox · 2014-04-25 01:35 · Score: 2, Funny

It's probably a mercy killing. Some of those poor servers were probably forced to run HP/UX. I'd want to die, if I had to run HP/UX...

--
I'm trying to teach myself to set people on fire with my mind... Is it hot in here?
Re: What? by NatasRevol · 2014-04-25 01:36 · Score: 2

That was available for a week.

--
There are two types of people in the world: Those who crave closure
Re:This would be why.. by oodaloop · 2014-04-25 01:37 · Score: 4, Insightful

You don't flash firmware unless it is for an important issue. Or at least not until it has been out quite some time so that other people have done your testing for you.
Your advice isn't really a general solution if, in order for it to work for anyone, some people must not follow it.

--
Tic-Tac-Toe, Global Thermonuclear War, and relationships all have the same winning move.
Re:ITIL by NatasRevol · 2014-04-25 01:38 · Score: 3, Insightful

Unless the executives don't give you 'non-critical boxes' for every piece of infrastructure to test updates.
"Why do you need an additional SAN at $100k? We'll deal with that if it happens. It happened? It's all your fault!"

--
There are two types of people in the world: Those who crave closure
Ah, this will help network throughput...... maybe. by data+plumber · 2014-04-25 02:11 · Score: 2

As a network engineer, I can see being involved in arguments between the server platform support teams (read: off-shore) and the network engineering teams (read: on-shore). It'll be like this; "we need network support on a call" "Hello. what's wrong?" "The entire network is down for everyone!!!!! You need to fix this!!!!! The support we get from you is horrible!!!! AAHHHHhhhhhhh!!!!!!" "OK. What changed? What was being done at the time the entire network disappeared for everyone?" "we (15 people on the call - it apparently takes that many) were doing nothing (to do nothing)." "OK, well, I'm on the cores and I can see a lot of traffic, other servers, the outside world etc. you need to define the "everything being down" part." "well, we were in the middle of doing a firmware update on server xxx01 and...." "OK, so, you lied to me about doing nothing. what did you update?" "the NIC card to improve performance for..." "And now you're wondering why the network is down..." It'll go this way for some time until the next couple of layers of management get involved....... lots of yelling, me sending pictures of the network working I should write a script for this call. I know it'll be coming.
Re:ITIL by gstoddart · 2014-04-25 02:19 · Score: 2

You know, if that's how your company is being ran, you should already be looking for another job.
Where I work, we've got proper test equipment, a CAB to review the proposed changes, and an expectation that you will test before deploying. When we schedule outages, we have to have a backout plan, and we're expected to have applied the updates in either the lab or a test environment.
The admins aren't considered sacrificial lambs, but they are expected to apply due diligence, test, and identify any risks. But once you've done that and made sure people know what you're doing and why, what the results of your testing is, and what you've done to mitigate any risks ... a bunch of senior people in IT have signed off on it and people have had a chance to voice their concerns. The people overseeing this tend to be department heads with a lot of industry experience, so they understand there is always risk, but they also understand what you need to do to minimize it.
If your company refuses to give you what you need to do your job without being able to do these things, your company is sailing straight towards a major disaster with or without you.
If your company is treating it as "stop talking and do it" combined with "but if you do it wrong you're SOL" ... your company is being managed by people who don't understand what is involved in your job, and will always have unrealistic expectations.
Companies which don't plan for these things, don't build a proper process around it, and don't fund being able to ensure things get tested are just being penny wise and pound foolish.
And, from a certain perspective ... I would never even consider applying a patch to a production environment which had only just been released by the vendor. At least a month, maybe as much as two. If someone wanted to put a firmware update on my production systems which was only just released from the vendor, the answer would be a firm "no bloody way". And my manager, and his manager, and all of the other people at that level would also be saying the same thing and back me on that position.
You have to have a company culture which owns the process, takes responsibility for it, and actually takes the time to understand the impact of it and plan for it.
Now, if a system admin does any of these things without going through all of the process, and things go wrong ... then you likely will be neck deep in crap pretty quickly. But if you have followed the process, and something goes wrong, the process shifts to remediating what went wrong, and understanding what can be done better next time. It has to be a continuous process, and it has to actually have some institutional memory, and companies have to take the process seriously.

--
Lost at C:>. Found at C.
Testing updates? by Hamsterdan · 2014-04-25 02:45 · Score: 2

Don't manufacturers test their updates? it's not like they couldn't keep some of the stuff they sell for said testing...

--
I've got better things to do tonight than die.
Re:ITIL by gstoddart · 2014-04-25 02:46 · Score: 2

Must be nice to work for a fortune 500 company to have the resources available...
You don't need to be a Fortune 500 company to apply this level of rigor. I'm quite sure we're not one.
Yes, you need resources to do it. Yes, you need corporate will to do it. And you also need to have a company whose culture includes actively assessing risk against their needs, as well as understanding how the risks translate into business risk. If the systems affect the actual production of your business, you need to treat it as Very Important.
If you stand to lose millions of dollars per hour in the event of an outage, the cost of screwing up gets pretty high. Which means the expense is absorbed. If you have much less exposure due to an outage, your tolerance to risk is going to be much higher.
My wife does outsourced/leveraged IT ... and some of her clients, if some environments are down, basically have to halt all production, shut down equipment, and go through an expensive restart process.
Even at the SMB scale, you need to understand your risks, and have management be partly responsible for the decision making process, as well as having people who can provide the information needed to make decisions. These shops may not have the resources to test and deploy everything to a lab, which means, if anything, they should be staying away from applying a brand new patch as soon as it's released.

--
Lost at C:>. Found at C.
Re:Yes, Hewlett Packard. A Genuine Legend. by kevmatic · 2014-04-25 04:07 · Score: 2

I imagine they're doing fine, working for the company you're talking about, Agilent.
Make no mistake: The only thing HP has to do with the company that was founded in the 40s is the name. The company that Bill Hewlett and Dave Packard founded still exists, making great stuff. Its just called Agilent. And they still support scopes and multimeters that say HP on them.
Proliant Nightmare by emil · 2014-04-25 05:00 · Score: 2

I deployed a DL380p Gen8 last year, and it gave me heart failure.
Under Red Hat, I needed to change the IP address, so I modified the file /etc/sysconfig/network-scripts/ifcfg-eth0 then did a "service network restart"
Alas, the box did not come up on the new IP. Got to the console which was blank and unresponsive. Power cycled, and the RAID array was GONE (and let's just say this was EXTREMELY inconvenient timing).
Support was able to walk us through some BIOS disk recovery that (thankfully) worked. But I'll never change the IP address on a Proliant without a full reboot.
Re:Does not appear to affect the G7 by UnknownSoldier · 2014-04-25 05:09 · Score: 2

> I guess a sixer counts as a old-timer now
Nah, just middle-age.