Slashdot Mirror


e1000e Bug Squashed — Linux Kernel Patch Released

ruphus13 writes "As mentioned earlier, there was a kernel bug in the alpha/beta version of the Linux kernel (up to 2.6.27 rc7), which was corrupting (and rendering useless) the EEPROM/NVM of adapters. Thankfully, a patch is now out that prevents writing to the EEPROM once the driver is loaded, and this follows a patch released by Intel earlier in the week. From the article: 'The Intel team is currently working on narrowing down the details of how and why these chipsets were affected. They also plan on releasing patches shortly to restore the EEPROM on any adapters that have been affected, via saved images using ethtool -e or from identical systems.' This is good news as we move towards a production release!"

5 of 111 comments (clear)

  1. Hardware of Software Problem? by Anonymous Coward · · Score: 5, Interesting

    Linus isn't very happy with Intel here:
    http://lkml.org/lkml/2008/9/29/368

    On Mon, 29 Sep 2008, Arjan van de Ven wrote:
    >
    > we have a patch to save/restore now, in final testing stages
    > (obviously we want to be really careful with this)

    Btw, the _real_ bug is clearly in the hardware design that allows you to
    brick those things without apparently even having a lock bit.

    I'm hoping Intel doesn't treat this as just a software bug. Some hw
    designer should be thinking hard about which orifice they put their head
    up in.

    It used to be that you could fry some monitors by feeding them
    out-of-range signals. The _monitors_ got fixed.

                    Linus

  2. Root cause still unknown? by AcidPenguin9873 · · Score: 4, Interesting

    Yes, they released a patch so that the NVM can't be overwritten after the e1000e driver is loaded. But from what I can tell, they still don't know what is/was responsible for the overwriting.

    FWIW, I'm almost positive that modern CPUs have debug traps for this exact sort of thing...you can trap arbitrary I/O writes via SMM or something...obviously I'm not in the debug loop, but I don't see why this has been so hard to figure out...

    1. Re:Root cause still unknown? by Almahtar · · Score: 2, Interesting

      Which makes me hope all attempts to write to the EEPROM are being logged in the new driver, with stacktraces.

      Otherwise what's the point of testing them? Sure they won't brick your card, but you can't get very useful feedback.

    2. Re:Root cause still unknown? by SuperQ · · Score: 4, Interesting

      So the thing is, there is more than just a simple "eeprom write interface" on these chips.

      Most of the time the the eeprom attached to the nic is a cheap small serial eeprom part, usually just a few kb.. maybe 32 or 64kb. It contains mostly things like a bit of boot strapping, a few "permanent" settings like the MAC address, and the PXE rom.

      And that's where the problems come in. This serial interface is usually an afterthought, and if there is noise on that bus, bits can flip. Or if something bad happens in the NIC code, you could accidentally write when you meant to read.

      Usually this is recoverable, but I haven't looked into this specific corruption situation. I've had to deal with this kind of thing before. It's not fun.

      Flashing NIC eeproms isn't something a normal end-user does all the time. 99% of the time it's written at the factory, stuffed on the board, and forgotten about.

  3. Re:News? by Spy+der+Mann · · Score: 5, Interesting

    I know this is News For Nerds and all that, but isn't this a tad specific?

    That's what sections are for. See the little Tux Icon over there? We all care about Linux. Besides, it's a VERY IMPORTANT BUG. A showstopper, so to speak. And keep in mind that a lot of people in here are kernel freaks. They want to test-drive the latest versions of the kernel. And one of the reasons why people keep coming here (and not to digg) is precisely for this kind of news.

    Thanks, ruphus13.