Slashdot Mirror


e1000e Bug Squashed — Linux Kernel Patch Released

ruphus13 writes "As mentioned earlier, there was a kernel bug in the alpha/beta version of the Linux kernel (up to 2.6.27 rc7), which was corrupting (and rendering useless) the EEPROM/NVM of adapters. Thankfully, a patch is now out that prevents writing to the EEPROM once the driver is loaded, and this follows a patch released by Intel earlier in the week. From the article: 'The Intel team is currently working on narrowing down the details of how and why these chipsets were affected. They also plan on releasing patches shortly to restore the EEPROM on any adapters that have been affected, via saved images using ethtool -e or from identical systems.' This is good news as we move towards a production release!"

111 comments

  1. News? by quarrel · · Score: 3, Insightful

    I know this is News For Nerds and all that, but isn't this a tad specific?

    An alpha/beta of the most recent linux kernel patch had a bug fixed, and it hits the front page?

    Don't get me wrong, I'm glad they found it, but this is kinda the point of debug cycles.. If we start reporting every bug squashed in all the major open source projects out there this is going to go downhill fast.. (of course, it's possible some may think that the idle. is only a step above..)

    --Q

    1. Re:News? by WK2 · · Score: 1

      (of course, it's possible some may think that the idle. is only a step above..)

      Or a step below...

      --
      Write your own Choose Your Own Adventure. http://www.freegameengines.org/gamebook-engine/
    2. Re:News? by Atriqus · · Score: 5, Insightful

      It's newsworthy because it was a bug that actually bricked hardware.

      --
      Hey, look! It's Bono's brother.
    3. Re:News? by SL+Baur · · Score: 5, Informative

      An alpha/beta of the most recent linux kernel patch had a bug fixed, and it hits the front page?

      They have not fixed the bug that caused the e1000e ethernet cards to get bricked. This is at least a two part bug. The EEPROM should not have been writable and Something Is Happening to cause bad writes to happen. What that "Something" is, no one knows yet, though it appears they are getting close.

      Linus is an absolute, total anal retentive with regards to fixing bugs by understanding and fixing the root cause[1], not just papering over it. This papers over it for the moment, because the bug hasn't been isolated yet, but it allows more people to participate because the side effects were really nasty - this was a true bricking of the ethernet card.

      This stage isn't newsworthy for Slashdot.[2] It must be a slow news day.

      [1] This is a Good Thing.

      [2] Nor will the real bug fix when it comes. A bug is found, a bug is fixed. Life, goes on.

    4. Re:News? by Spy+der+Mann · · Score: 5, Interesting

      I know this is News For Nerds and all that, but isn't this a tad specific?

      That's what sections are for. See the little Tux Icon over there? We all care about Linux. Besides, it's a VERY IMPORTANT BUG. A showstopper, so to speak. And keep in mind that a lot of people in here are kernel freaks. They want to test-drive the latest versions of the kernel. And one of the reasons why people keep coming here (and not to digg) is precisely for this kind of news.

      Thanks, ruphus13.

    5. Re:News? by PReDiToR · · Score: 0

      What, really "bricked" or just needing a reflash?

      I mean, who in their right mind would call a PC without an operating system bricked? Just because you have to put a floppy in to install an MBR and command environment (a la the 3 DOS install disks from yesteryear) a bricked system?
      Compare that to running an operating system like DOS on an old Athlon that didn't have a big enough heatsink/fan, no ACPI or 'hlt' commands built in, and the processor overheating to the point of literally burning itself up.

      Sorry for being pedantic, but you never know when someone reading this post might want to intelligently ask for help installing Linux on their router or upgrade their WindowsMobile PDA.

      --

      Do not meddle in the affairs of geeks for they are subtle and quick to anger
    6. Re:News? by sumdumass · · Score: 5, Insightful

      Try Erasing the BIOS on the main board and you will be more accurate in your comparison.

      This bug actually flashed the firmware for the network controller and hosed access to it in some unexplained sort of way. That is something note worthy because of the rarity of it. If it was simply hosing something that was readily diagnosable and more common like a boot sector or something, then it would be different. It isn't often the software is associated with hardware damage either purposefully or accidentally.

      BTW, I know there are recovery methods for a hosed BIOS. That isn't the point. Simply installing an operating system shouldn't hose it nor should it hose hardware either. Imagine all the people who just thought their card was broken or something and went for a refund under warranty or the bad name Intel or Linux received for the "faulty shipment of devices" or the ability to break a device. This is something that would work in windows, load Linux in a dual boot mode, it would stop working in both windows and Linux without any errors or indication that the car was even capable of being seen by the mainboard.

    7. Re:News? by BrokenHalo · · Score: 1

      Even for alpha, that's stupid. Something I've come to expect from Linux and its "I've got to be the neatest" mentality.

      Even for Anonymous Coward, that was a stupid thing to say. A bug existing in alpha or beta versions does not constitute shoddy software overall. That is, after all, what alpha and beta releases are for. I don't need to catalogue the bugs in Windows that are never even acknowledged, let alone fixed, but production releases of Linux are generally as solid as anyone could wish, and bug reports are open for everyone to see, and do get acted upon.

    8. Re:News? by Anonymous Coward · · Score: 0

      Another thing you should expect is people telling you to shut the fuck up.

    9. Re:News? by ruphus13 · · Score: 1

      Thanks Spy - I, for one, was looking forward to testing this out, and, luckily hadn't gotten down to getting the latest bits when I read about the bug. Now I can proceed to find the next ones!

    10. Re:News? by Whiteox · · Score: 0, Troll

      It's been a slow news week. Maybe the economic crunch is having an effect on geek news...

      --
      Don't be apathetic. Procrastinate!
    11. Re:News? by Anonymous Coward · · Score: 0

      The "pre-release" code is released so people can experiment with it, patch it, and hopefully fix it. If you get burned because you thought that an alpha release was the bleeding edge and gave you a l33t system, then you get exactly what you deserve.

    12. Re:News? by Anonymous Coward · · Score: 0

      You're obviously in the wrong forum. Doesn't microsoft give you trolls your own?

    13. Re:News? by Hal_Porter · · Score: 1

      They pay us to post here.

      --
      echo -e 'global _start\n _start:\n mov eax, 2\n int 80h\n jmp _start' > a.asm; nasm a.asm -f elf; ld a.o -o a;
    14. Re:News? by Anonymous Coward · · Score: 0, Flamebait

      Well - let's not forget how this bug has gotten into the kernel at the first place. Was this a driver developped by the kernel team?

      NO! - This was a driver given by Intel and written with Intel's own specifications. To be precise - it is not the kernel developer's or Linux developer's who are faulty, but Intel that messed things BIG up here. If you can't trust the specifications or software written by the manufacturer himself then you are in trouble.

      Thinking about it.. We all know how close Intel and Microsoft are. Now - If you can hurt Linux by releasing specifications and software that bricks hardware would that no be a nice coincidence? I mean - linking Linux with hardware going defective would not hurt Microsoft hmmm?

      Oh well.....

    15. Re:News? by SanityInAnarchy · · Score: 1

      I mean, who in their right mind would call a PC without an operating system bricked?

      Entirely too many people. Of course, "right mind" is subjective...

      --
      Don't thank God, thank a doctor!
    16. Re:News? by Ornedan · · Score: 1

      The only versions of Windows you ever get to see are the final releases. However, I'd bet they occasionally break some hardware on that multi-thousand machine internal testing farm of theirs.

      And what part of "alpha release" does imply "not testing" to you?

    17. Re:News? by Mhtsos · · Score: 1

      What I found newsworthy is that I can expect the latest windows worm / trojan / virus to brick a whole bunch of network cards (at work, don't throw stones) as it's now more clearly documented that it can be done. I think it was mentioned in a previous article that the real bug is that bricking through software is possible at all.

    18. Re:News? by RiotingPacifist · · Score: 1

      No that is what is to be expected from an alpha, anything else means your just taking unnecessary risks. Alpha means the code has been developed and tested internally, NOT with your programs, NOT with your hardware, now if you run linus' or morton's machine then you will probably not come across this kind of bug but anybody else is running essentially untested code. While BSD claim they review their code, the fact that this bug wasn't caused by somebody commenting out #do not break drivers foo means that a code review by anybody that didn't design/work on the chips was probably going to miss it.
      If it had made it to beta then id be worried not enough testing was being done but as it was caught in alpha software that is still about a year before end users will touch it (probably 2-3 if your running a server?) and has yet to be audited by distros i don't really expect this thing to upset anybody but the idiot that installs Linux and then goes "hmm im a windows uber leet pro, i can run the alpha software no probs"

      kernel somebodies svn -> morton branch -> alpha (stoped here) -> beta -> rc -> release
      distro proposed -> unstable -> release
      it was stopped at stage 3 of 9, but you have to remember that each stage is tested by a lot more people than the previous.

      --
      IranAir Flight 655 never forget!
    19. Re:News? by Anonymous Coward · · Score: 1, Funny

      Life, goes on

      James T. Kirk, is that you?

    20. Re:News? by corporatefucker · · Score: 1

      that the car was even capable of being seen by the mainboard.

      Now where did you start with the car analogy?

    21. Re:News? by sjames · · Score: 2

      It was even more fun. Once the card was hosed, not only would it not work, but it required a bit of hacking to get it recognized enough to attempt a re-flash (assuming you had an image of the correct contents to flash in).

      The exact cause was mysterious as well since it didn't happen to everyone, nor was it predictable if or when it would happen.

    22. Re:News? by sjames · · Score: 1

      What, really "bricked" or just needing a reflash?

      Bricked but theoretically recoverable with some further work.

      The cards should be fixable by reflashing, but when you can't enumerate the card on the bus, that's a bit of a challenge.

    23. Re:News? by Brian+Gordon · · Score: 1

      card

    24. Re:News? by LarsG · · Score: 1

      If Windows worm writers want to do hardware bricking evil, there isn't exactly a lack of potential targets already out there. It is not impossible to write a program to trash firmware on many video cards, HDs, DVD drives and the like. But you do tend to have to try to be evil in these cases, not just get an
      address wrong.

      The difference in this particular situation is that the e1000e fw got trashed by accident as opposed to by a program specifically written to do so.

      Most windows malware these days are not written to do damage to hardware. They are written to send spam, sniff out passwords, do DDoS and the like so it is in the interest of the malware writer to keep the actions of the software below the user's radar instead of breaking their computers.

      --
      If J.K.R wrote Windows: Puteulanus fenestra mortalis!
    25. Re:News? by Anonymous Coward · · Score: 0

      Even for alpha, that's stupid. Something I've come to expect from Linux and its "I've got to be the neatest" mentality.

      Even for Anonymous Coward, that was a stupid thing to say.

      No it wasn't. I'm not the same AC but I agree with the OP. Open source stuff in general has a high degree of shoddiness and unprofessional programming but this one takes the cake. It's a little like discovering that turning on the windshield wipers can seriously hose the engine of your car, disabling it and requiring complicated repairs. If something like this had happened in a field other than open source software a lot of you would be ridiculing those responsible and calling them worse than morons.

      We use a major enterprise Linux distribution in our product and we're appalled at how many things the vendor breaks even with SP updates. We have to forbid our customers from doing any updates on their own because if they do, their systems will cease to function.

    26. Re:News? by LarsG · · Score: 1

      What, really "bricked" or just needing a reflash?

      Kinda depends on the definition. Some of the technorati won't consider something bricked until it is literally physically broken and beyond repair. Some include situations that requires hardware intervention (e.g., desoldering and swapping a SMT-mounted ROM), or specialized tools with limited availability (e.g., special reflash equipment only available to the manufacturer of the device) to fix it.

      From what I can gather from a quick skim of lkml, it is a bit uncertain as to how bricked these cards are - that is, if the fw is broken to the point where it can be reflashed over the pci bus or not. And the fw is corrupted in different ways, so some might be recoverable and some might not without special equipment.

      Until Intel comes out with a working reflash tool, I suppose the cards can be considered as being in a brickedness superposition since it is not yet clear if they are fixable or not.

      On the other hand, members of the iCrowd seems to think that "bricked" = "loss of some functionality".

      --
      If J.K.R wrote Windows: Puteulanus fenestra mortalis!
    27. Re:News? by sdsucks · · Score: 1

      Quite a few people run dev kernels -- especially those running new hardware that requires the absolute latest support. This is a little more serious than most bugs.

  2. Good News Everyone! by kcbanner · · Score: 5, Funny

    Hwwaa? Oh yes...the kernel does't corrupt your EEPROM anymore!

    --
    Obligatory blog plug: http://www.caseybanner.ca/
  3. Hardware of Software Problem? by Anonymous Coward · · Score: 5, Interesting

    Linus isn't very happy with Intel here:
    http://lkml.org/lkml/2008/9/29/368

    On Mon, 29 Sep 2008, Arjan van de Ven wrote:
    >
    > we have a patch to save/restore now, in final testing stages
    > (obviously we want to be really careful with this)

    Btw, the _real_ bug is clearly in the hardware design that allows you to
    brick those things without apparently even having a lock bit.

    I'm hoping Intel doesn't treat this as just a software bug. Some hw
    designer should be thinking hard about which orifice they put their head
    up in.

    It used to be that you could fry some monitors by feeding them
    out-of-range signals. The _monitors_ got fixed.

                    Linus

    1. Re:Hardware of Software Problem? by techno-vampire · · Score: 5, Insightful
      He's got good reason. It should be impossible for the system to write to the EEPROM without special measures being taken, possibly a jumper that has to be removed to allow it. And, if possible, the card won't work right (in some way that doesn't prevent boot) until the jumper's put back to normal. That way, if you really have to re-flash it, you can, but it's not going to happen by accident.

      I remember having a motherboard with a jumper that had to be specially set to update the BIOS. The smart way was to power down, open the case and pull the jumper so that you could flash the EEPROM. Then, of course, once that was done, reverse the procedure for safety. I always regarded anybody who left the jumper off for the rare convenience as fools who deserved anything that might happen.

      --
      Good, inexpensive web hosting
    2. Re:Hardware of Software Problem? by Anonymous Coward · · Score: 0

      It used to be that you could fry some monitors by feeding them
      out-of-range signals. The _monitors_ got fixed.

                                      Linus

      I remember that. Wasn't it like the IBM PC Jr. or something, and you could ask the display to refresh at 0Hz and that would cause it to fry. Can't remember if actual CRT implosion was urban legend or not, but monitor death was real.

    3. Re:Hardware of Software Problem? by Anonymous Coward · · Score: 0

      Well, I accidently fried my Compaq "Portable"'s video card circa 1983 or so. It had a built-in 9" green screen VGA graphics card and monitor. I was experimenting with the DOS debug command and writing assembler. I had to take it to the long-defunct Computerland store for a new video card.

      I'm sure that other hardware suffered the same problems.

    4. Re:Hardware of Software Problem? by Anonymous Coward · · Score: 0

      Really? Have you actually ever done a BIOS upgrade on an anywhere near modern PC? They don't need jumpers being moved. It's not normally a problem at all.

      Besides, when you actually WANT to do a firmware update, it's yet another thing. What was a five minute task (possibly not even a reboot) now becomes a scheduled shut down, open case, flip jumper, boot, try and install update, realise that your network card isn't working now so you have to fish update using a USB memory key or something, install update, shut down, restore jumper (which you haven't lost, I hope) and boot up again.

    5. Re:Hardware of Software Problem? by techno-vampire · · Score: 1

      Well, I haven't needed to do a BIOS upgrade in this millennium, I think, and I only had one motherboard that needed a jumper change. As far as your comedy of errors goes, anybody who didn't plan ahead and make sure the update was already on the hard disk before starting deserves all the problems you described. And, of course, flashing the EEPROM on a NIC should be a rare event. Nice strawman, though.

      --
      Good, inexpensive web hosting
    6. Re:Hardware of Software Problem? by SanityInAnarchy · · Score: 1

      possibly not even a reboot

      Unlikely. I would imagine that this flavor involves writing the new firmware to some dedicated chunk of memory, where it will be pulled either by the OS or the BIOS itself on next reboot.

      --
      Don't thank God, thank a doctor!
    7. Re:Hardware of Software Problem? by SanityInAnarchy · · Score: 1

      Well, I haven't needed to do a BIOS upgrade in this millennium, I think

      Good for you...

      I know my keyboard has had its firmware upgraded at least once. I haven't had to do a BIOS update for awhile...

      I do remember a series of incremental improvements to the whole process:

      The very first time I flashed a BIOS, it was relatively easy -- just run the BIOS update program (in Windows), which formats a floppy for me, which I then boot off of. After booting the floppy, I still have to dump the BIOS, then load the new one -- from a DOS commandline.

      I streamlined the process a bit when CDs got cheap -- I burned a FreeDOS CD, and left the actual updates on the hard drive, so I could keep using the same CD.

      And some BIOSes supported flashing themselves, from inside the BIOS. Unfortunately, these tools were mostly limited to floppies -- I had to go pick up a floppy drive and plug it into the computer.

      More recently -- it turned out I didn't actually need an update, but it was such an easy process that I didn't mind. Pretty much just paste a couple of commands into a terminal on Linux (versus printing them, writing, memorizing, or copying from another computer), then on my next reboot, new BIOS.

      Easiest, of course, was on OS X -- firmware updates of any kind are included with Apple's "Software Update", meaning you might not even notice.

      flashing the EEPROM on a NIC should be a rare event.

      So should any kind of BIOS update.

      But then, so should any software bug. You could say that firmware should be held to a higher standard, and I'd agree, but if and when there's a firmware bug, I'd much rather have it patched quickly and conveniently than have to schedule a few hours (or an afternoon), and dig up an old floppy drive and/or a copy of FreeDOS.

      --
      Don't thank God, thank a doctor!
    8. Re:Hardware of Software Problem? by mczak · · Score: 2, Informative

      Jumpers are not really used a lot these days. They cost extra, and are clumsy to handle (need to open case). You are right it would be really good if there were some precautions taken so no accidental writes happen (for instance need some special command sequence hard to trigger accidentally), but often those eeprom chips just have a simple serial interface, and reading and writing works almost exactly the same. A couple of years ago you could easily overwrite the eeprom of hauppauge tv cards (though there wasn't much information in there, just the exact model IIRC which was needed to set things up fully correct), a bug very similar to this.

    9. Re:Hardware of Software Problem? by Baki · · Score: 1

      At least for consumer hardware we have come to expect that it cannot be damaged by buggy software, but in general it is not true that hardware should always protect itself against bad software. Just consider much of embedded software, e.g. the flight software for aeroplanes. Wrong software will result in "hardware damage", the same for most robots etc.

      I am quite sure that even a microprocessor driven washing machine nowadays could damage itself if the (embedded) software were buggy.

    10. Re:Hardware of Software Problem? by pslam · · Score: 1

      The strange thing is that I've written drivers for many EEPROMs, and they all have a few hoops you have to jump through to enable writing. It's not something you can just accidentally do.

      Usually it's something like 'Read address 0xaaaa then 0xdddd then write some magic byte then the address then write 128 bytes'.

      Perhaps Intel thought they didn't need all that magic?

    11. Re:Hardware of Software Problem? by Xugumad · · Score: 2

      Given the cost of EEPROM space, I think the better answer is to double the size. One half is readable, one writable, at any point in time. To update, you write, turn off, flip the jumper across to the other side (or, heck, just use a physical switch) and you're done. Bricking isn't absolutely impossible (you could write a damaged image to one half which wipes the other when it boots), but essentially infeasible.

    12. Re:Hardware of Software Problem? by Anonymous Coward · · Score: 0

      With some really old 5.25 floppy drives you could tell it to seek some weird sectors and get the drive heads to jam. Didn't ruin the hardware but you had to open the drive casing to free them.

    13. Re:Hardware of Software Problem? by Agripa · · Score: 2, Informative

      It is not uncommon to require a set of magic numbers to be written before writing to protected memory. The magic numbers and/or access pattern is designed so that no simple or likely hardware failure will allow unprotected access. Small discrete or integrated EEPROMs often have this functionality built in.

    14. Re:Hardware of Software Problem? by mpe · · Score: 1

      And, of course, flashing the EEPROM on a NIC should be a rare event. Nice strawman, though.

      Doing any kind of firmware upgrade should be a rare event. At minimum it should involve first shutting down the driver accessing that piece of hardware. If the peripheral is designed sensibly an "upgrade firmware" command would require some kind of "handshake" and only be accepted as the first command after a reset.

    15. Re:Hardware of Software Problem? by mpe · · Score: 1

      At least for consumer hardware we have come to expect that it cannot be damaged by buggy software, but in general it is not true that hardware should always protect itself against bad software. Just consider much of embedded software, e.g. the flight software for aeroplanes.

      Hence you'd never upgrade the firmware on all the redundant computers on an airliner at the same time. Typically with there being a minimum time (both by the calender and flying) between such upgrades.

  4. Great! by silent_artichoke · · Score: 3, Funny

    I'm gonna download it now! Oh, wait... crap.

  5. e1000 been broken a while by AaronW · · Score: 3, Insightful

    About a year ago we built up some new machines to run Linux and found that multiple e1000 cards would cause the Ethernet connectivity to drop and become useless. We ended up replacing them with much cheaper Realtek cards and all the problems disappeared. I haven't trusted Intel since. It's as if there were some buggy interrupt interaction with the on-board Intel Ethernet in the 915 chipset.

    --
    This post is encrypted twice with ROT-13. Documenting or attempting to crack this encryption is illegal.
    1. Re:e1000 been broken a while by Anonymous Coward · · Score: 1

      I've never had a problem with their cards. They're about the only NIC that i've never needed to mess with to get Linux to see. NICs built into the motherboard NB/SB are the biggest problem usually. The PCI-X cards work in PCI slots and in the tests I've done they're usually able to push 30-40% more data through the network than other NICs.

    2. Re:e1000 been broken a while by sumdumass · · Score: 4, Informative

      3com used to be that way too. I'm not exactly sure what it was but the 3c905's rocked and would run data quite a bit faster then any other card at the time. I know they had a full blown data processors on the cards but I assume the others would to. I used to go to computer shows just to pick them up for $10-$20 used because they had the same effects on data performance as you would see with rendering going from a S3 trident video adapter to a Gforce video card. I because seriously convinced when at a lan party with an AMD Athlon 800 system running windows 98se with 256 memory and we had to pull a 100 meg file from a file server to get the updates in sync to a game to play. I started pulling the file last because of helping others find it, I was on the tail end of the 3rd tire of uplinked switches and I had the file installed while others were still transering it. The funny part is that people with their brand new Windows XP 1.4 and 1.8 gig plus systems were still slower and the only thing I can attribute to it is the NIC.

      Intel caught up with 3com in this aspect and despite my older fascinations with 3com, I'm actually an Intel fan in this one respect now.

    3. Re:e1000 been broken a while by kesuki · · Score: 1

      processors and sub systems have gotten a lot faster since then.

      i know, cheap ethernet interfaces are slower than the fastest cards out there, but your experience, from many years back when a 800 mhz cpu was fast, are a bit dated. a 100 MB file shouldn't take long enough to download from a file server even with a cheap nic unless there is a performance issue with the file server in question. 100 megabytes shouldn't take more than a few seconds to transfer across a lan.

      in theory a 100 mbit lan should take 8 seconds to transfer a 100 megabyte file. in theory a 1000 mbit lan should only take .8 seconds. obviously file IO limitations do apply. if your hard drive can only do 80 mbit, and only on the start of the drive it's going to take longer.

    4. Re:e1000 been broken a while by LordNimon · · Score: 1

      It's funny you say that. A few years ago, I asked on a mailing list for the most Linux-friendly gigabit ethernet card, and almost everyone said e1000. I've been happy with mine ever since. My distro was a bit too old for the card, but I was able to download the drivers from intel.com and install them without any problems.

      --
      And the men who hold high places must be the ones who start
      To mold a new reality... closer to the heart
    5. Re:e1000 been broken a while by Fweeky · · Score: 1

      Quite a few problems like that seem to be MSI-X related, did you try disabling them?

    6. Re:e1000 been broken a while by sumdumass · · Score: 1

      Of course the 800 mhz system was when I First noticed that there was a difference back in 2000/2001 and things have come along faster now.

      But to reach the maximum speeds, you have to make sure you have newer equipment capable of hitting the faster speeds and that the lines are in good near perfect order to realize the maximum speeds. You also have TCP overhead that inflates the transmision size of the 100 meg file and other factors to consider like multiple users accessing the same interfaces, the amount of time required for disk access to not only break the file down for transmission but to replace it on the other computer and all.

      I guess what I'm saying is that there is a difference in network cards out there and the Intel pro adapters have what makes the difference. Don't concentrate too much on when I discovered this, a lot has came along in the last 7-8 years.

  6. Root cause still unknown? by AcidPenguin9873 · · Score: 4, Interesting

    Yes, they released a patch so that the NVM can't be overwritten after the e1000e driver is loaded. But from what I can tell, they still don't know what is/was responsible for the overwriting.

    FWIW, I'm almost positive that modern CPUs have debug traps for this exact sort of thing...you can trap arbitrary I/O writes via SMM or something...obviously I'm not in the debug loop, but I don't see why this has been so hard to figure out...

    1. Re:Root cause still unknown? by moteyalpha · · Score: 1

      It makes me wonder if they have the tools available to do their job. When I did this type of work we had analyzers and ICE machines which makes it easy if you know how to use them. Are the kernel designers getting enough support to buy the needed hardware? Sometimes these things go beyond the software and can happen because of a physical condition that is untrappable in SMM, like a DMA over the top of refresh cycle fault.

    2. Re:Root cause still unknown? by SL+Baur · · Score: 1

      obviously I'm not in the debug loop, but I don't see why this has been so hard to figure out...

      Because it bricked the card. No way to have it fixed other than to get a replacement as there was no way reload the firmware.

      People were scared to test.

    3. Re:Root cause still unknown? by jhol13 · · Score: 1

      I think it more interesting question is "how can someone overwrite".

      With that I mean "isn't there any tests around", not that Linux should (magically) become a microkernel (not that I would mind).

    4. Re:Root cause still unknown? by Almahtar · · Score: 2, Interesting

      Which makes me hope all attempts to write to the EEPROM are being logged in the new driver, with stacktraces.

      Otherwise what's the point of testing them? Sure they won't brick your card, but you can't get very useful feedback.

    5. Re:Root cause still unknown? by vally_manea · · Score: 1

      Actually I think the guys working on this are Intel engineers so probably they have everything they need.

    6. Re:Root cause still unknown? by SuperQ · · Score: 4, Interesting

      So the thing is, there is more than just a simple "eeprom write interface" on these chips.

      Most of the time the the eeprom attached to the nic is a cheap small serial eeprom part, usually just a few kb.. maybe 32 or 64kb. It contains mostly things like a bit of boot strapping, a few "permanent" settings like the MAC address, and the PXE rom.

      And that's where the problems come in. This serial interface is usually an afterthought, and if there is noise on that bus, bits can flip. Or if something bad happens in the NIC code, you could accidentally write when you meant to read.

      Usually this is recoverable, but I haven't looked into this specific corruption situation. I've had to deal with this kind of thing before. It's not fun.

      Flashing NIC eeproms isn't something a normal end-user does all the time. 99% of the time it's written at the factory, stuffed on the board, and forgotten about.

    7. Re:Root cause still unknown? by Ornedan · · Score: 1

      From what I've read, the bug causing the overwrite is in somewhere other than the network card's driver. That something is overwriting random memory and it happens to hit the memory region mapped for writing the card's firmware.

    8. Re:Root cause still unknown? by Anpheus · · Score: 1

      The problem is that rather than do it the easy way with that alphabet soup of acronyms listed up there, they broke out their handy electron microscope to examine it.*

      * Yes, I'm jealous.

    9. Re:Root cause still unknown? by jimicus · · Score: 1

      I think it more interesting question is "how can someone overwrite".

      Very easy, if the card is designed to have field-updateable firmware. You just need to send it the right (or in this case wrong) command.

      Ideally the manufacturer would make it so that you have to go through all sorts of hoops before you've done anything permanent, but this isn't the first time something like this has happened.

    10. Re:Root cause still unknown? by jhol13 · · Score: 1

      You missed my next sentence.

      What I am complaining is the lack of proper testing in Linux. If there were proper tests for the module which does the overwriting, the problem would have never occured at all.

    11. Re:Root cause still unknown? by jimicus · · Score: 1

      What I am complaining is the lack of proper testing in Linux. If there were proper tests for the module which does the overwriting, the problem would have never occured at all.

      Are you trolling or do you honestly not understand the implications of it being an alpha release?

      In other words "This release is for testing purposes; by all means report a bug if it breaks but don't be too surprised if the breakage is catastrophic. If you use this on something important, you are nuts and should seek help". In traditional, closed-source development, alpha releases are produced, they may or may not break things. Now, for software living entirely in userland you probably won't cause hardware problems but at the kernel layer, this is entirely possible simply because so many things are designed to have field-upgradeable firmware (which is usually what gets damaged).

      Working at a company that does embedded software development, I can tell you now that these things do happen from time to time. If they didn't, there would be no such thing as JTAG programmers.

      The only difference here is that because the Linux kernel's development process is open to the world, these things are known by the whole world.

    12. Re:Root cause still unknown? by jhol13 · · Score: 1

      The problem is that whole Linux is alpha release, all the time.

      This is proved by the fact that there really is no proper testing done. As said, this would have been found before alpha "release".

      You know extremely well that your embedded systems would be completely useless to the users had them no proper tests.

      Yes, I know that there are huge number of devices sold every day which have not been tested as well as they should, or the company has decided to sell despite of the known bugs.

      Linux does the latter on the premise "it is free", "don't complain, fix it yourself" or the one you gave.

    13. Re:Root cause still unknown? by Anonymous Coward · · Score: 0

      Excuse me? Linux is not always on alpha. There is a stable release right now that you should use if you don't want to take risks (Most serious users stay on the stable final release so as to not break stuff) This bug surfaced in the latest alpha as it was released for testing. FOR TESTING! It was people who wanted to test the latest alpha that got their NICs bricked. They dont blame linux though as they had ample warning before hand that playing around with alphas might break stuff. Now take your trolling someplace else.

  7. So in a nutshell... by GlobalColding · · Score: 2, Funny

    From RTFA the cause of the problem has not been identified yet, however the problem is prevented from being able to present itself going forward by maliciously writing/erasing non volatile memory. Since the problem was caught at alpha/beta stages the stable releases were unaffected. BTW, My boss tried to RTFA over my shoulder and shot cheese out of his ears (he is the non techie type). Its threads like these that absolutely cement /.'s place as the worlds dominant UBER NERD site.

    1. Re:So in a nutshell... by BPPG · · Score: 1

      Its threads like these that absolutely cement /.'s place as the worlds dominant UBER NERD site.

      ummm... good?

      --
      What's the value of information that you don't know?
    2. Re:So in a nutshell... by jimicus · · Score: 1

      My boss tried to RTFA over my shoulder and shot cheese out of his ears

      Can he do that on demand?

  8. Re:Get on with it? Vista hides behind Mohave Proje by porl · · Score: 1

    huh?

  9. Re:Jump for joy! by poopdeville · · Score: 0, Offtopic

    Candlejack, is that

    --
    After all, I am strangely colored.
  10. More recently than that by Nimey · · Score: 1

    Supposedly with pre-multisync monitors (say, your average early-'90s monitor, like my old Tandy VGM-340) if you weren't careful about what X modelines you used you could fry your monitor.

    --
    Hail Eris, full of mischief...

    E pluribus sanguinem
    1. Re:More recently than that by Anonymous Coward · · Score: 0

      Yes, Linus mentions this in the quote above. He also points out that the final fix was in the monitor itself, not the kernel.

    2. Re:More recently than that by Chris+Burkhardt · · Score: 1

      Thanks for the recap there.

      --
      "And there be unix which have made themselves unix for the kingdom of heaven's sake." - Matt. 19:12
    3. Re:More recently than that by iampiti · · Score: 1

      I don't know from which year his monitor was. But around 2000 a friend of mine fried his CRT trying to install linux. So Linus is right

    4. Re:More recently than that by retchdog · · Score: 1

      I remember that. Doing it by hand, or with the open-source tool (I think it was called Xconfigurator) was scary and full of warnings like "This may damage your hardware". So scary, that there was a commercial non-free software, which did nothing but configure X "more safely". I remember one friend of mine was excited when it came out, because there was finally warez for linux. :P

      I just held my breath and used xconfigurator.

      --
      "They were pure niggers." – Noam Chomsky
  11. Re:Lol! Open sores. by BPPG · · Score: 1

    Just buy a copy of Windows and get on with it.

    You may have missed the part where it said that this is a development release. Also, installing a development release of the next Windows might brick your system AND get you sued. ;-)

    --
    What's the value of information that you don't know?
  12. Re:Solid state drives by BPPG · · Score: 1

    You have nothing to worry about, this article is referring to a development release of Linux, you won't see it in a normal distro...

    **braces himself for the imminent whoosh

    --
    What's the value of information that you don't know?
  13. Re:Get on with it? Vista hides behind Mohave Proje by setagllib · · Score: 1

    Argh. Markov Chain text garbage got modded Insightful.

    --
    Sam ty sig.
  14. Re:Get on with it? Vista hides behind Mohave Proje by Hal_Porter · · Score: 1

    Does that mean the spambot passed the Turing test or the moderator failed it?

    --
    echo -e 'global _start\n _start:\n mov eax, 2\n int 80h\n jmp _start' > a.asm; nasm a.asm -f elf; ld a.o -o a;
  15. So, we put the workaround in _hardware_? by SanityInAnarchy · · Score: 5, Insightful

    Linus has a very good analogy here -- in fact, I love the fact that on the rare occasions I have to set modelines myself, I can pretty much put whatever I want, knowing that if it doesn't work, I can just ctrl+alt+backspace and try again.

    But the conclusion does bother me: We're basically saying that all software is buggy, or that we're incapable of preventing this kind of thing from happening (in software). This is true of most modern OS designs -- monolithic kernels do make it possible for pretty much any driver to accidentally ruin any other driver's day.

    The proposed workaround, then, is to prevent that memory from being written -- and to prevent this in hardware, for no other reason than to avoid having to write it into every kernel that might potentially allow buggy code to run in Ring 0.

    I don't like either solution. Hardware shouldn't be brickable from software, or at least, not so easily. But software shouldn't need hardware to coddle it, either -- why is the SSD in this laptop emulating a hard disk?

    --
    Don't thank God, thank a doctor!
    1. Re:So, we put the workaround in _hardware_? by PRMan · · Score: 4, Insightful

      Yes, because as long as the hardware can be bricked by software, it remains an exploit that can be used by malicious software writers.

      Speaking of the fried monitors, back in the day a college I worked at got a virus that fried 2 monitors before I got smart and put a Hercules monochrome card in it and cleaned it up.

      So, yes, while it can (and should) be worked around in Linux, it should also be fixed in hardware, if possible.

      --
      Peter predicted that you would "deliberately forget" creation 2000 years ago...
    2. Re:So, we put the workaround in _hardware_? by klapaucjusz · · Score: 1

      \

      But the conclusion does bother me: We're basically saying that all software is buggy,

      No. What we're saying is that we build layered systems, and that every layer is expected to protect its integrity from the higher layers.

      The hardware protects itself from software (no brain-damaged hardware interfaces), the kernel protects itself from userspace (priviledged vs. unpriviledged mode), system userspace protects itself from user userspace (root vs. non-root), userspace protects itself from interpreted network code (sandboxing).

    3. Re:So, we put the workaround in _hardware_? by SanityInAnarchy · · Score: 1

      as long as the hardware can be bricked by software, it remains an exploit that can be used by malicious software writers.

      Except, where do you draw the line?

      Software control of fans means a virus could spin them all down, and run some complex calculations (PI) to spin the CPU up.

      Software control of hard drives means you can spin them up and down all day, and wear them out an order of magnitude faster.

      Software control of a printer means you can print page after page of black ink, using up an ink cartridge.

      Software control of a Roomba means you can deliberately crash it into walls, or possibly down the stairs.

      I have enough coddling in my software. ("Are you sure you want to run this program from the Internet? It might be a virus!") While it might be safer, I really don't want to be in a situation where my hardware is telling me I can't do something, because I might screw it up.

      --
      Don't thank God, thank a doctor!
  16. Re:Solid state drives by SanityInAnarchy · · Score: 1

    If there's a whoosh, I don't get it either, other than that it has to be...

    I don't think Intel makes solid state drives. Nor does Intel make the EEE PC. Nor does any EEE PC ship with an experimental kernel. Nor does an ethernet card have anything to do with a hard drive.

    Some quick Googling shows that the 901 may have gigabit, maybe not -- and if it did, and if they were this particular Intel card, you might be affected. Which would still have nothing to do with the SSD.

    But after checking the manuals I could find, it doesn't look like it supports gigabit at all.

    --
    Don't thank God, thank a doctor!
  17. Saw something like that once by Gazzonyx · · Score: 1

    I had the same thing pop up on a supermicro (ICH-7, IIRC... dual Xeon 5xxx's) at work. Recompiling the modules and reinstalling them seemed to fix the problem. Like most hardware problems, it seems to be just the wrong combination of drivers, hardware, software and luck.

    I think a yum update is what triggered it, but I'm not sure; it just popped up out of nowhere and acted in such a way that I couldn't ever corner the thing. Recompiling the modules was one of those things that I did while I was thinking about the problem and trying to isolate stupid variables. I really didn't expect it to fix the problem.

    I also remember that one of the network cables was found to be flaky some time later - it could all be coincidence.

    At any rate, I've found Realtek chips to be... less than desirable, yet durable enough to take a good beating. Their Linux support isn't bad, either. You could do worse, in regards to bang for your buck, than a Realtek based card, IMHO.

    --

    If I mod you up, it doesn't necessarily mean I agree with what you've said, sorry.

  18. From what I can tell, it's a cocktail deal by Gazzonyx · · Score: 1

    From what I can tell, the bug is only being seen on bleeding edge combinations of software in bleeding edge distros. They're thinking it's a combination of the driver and a new release of X (one allows for the conditions, the other glitches after that), but there's very little 'tried-and-true' stuff in a bleeding edge distro.

    --

    If I mod you up, it doesn't necessarily mean I agree with what you've said, sorry.

  19. I had a similar problem with Broadcom WiFi card by postmortem · · Score: 0, Troll

    ..after using it in Linux, it is not recognizable by any PC.

  20. ATA is an abstraction by tepples · · Score: 1, Informative

    why is the SSD in this laptop emulating a hard disk?

    It's not. ATA's wire protocol uses a hardware abstraction over block storage devices, as does USB Mass Storage Class. The hard disk is emulating an ideal block device, and the SSD is also emulating an ideal block device.

    1. Re:ATA is an abstraction by mpe · · Score: 1

      ATA's wire protocol uses a hardware abstraction over block storage devices, as does USB Mass Storage Class. The hard disk is emulating an ideal block device, and the SSD is also emulating an ideal block device.

      This has been the case for a long time. Even with parallel IDE the drive geometry reported by the controller was typically a complete fiction. Another common feature is the ability for the drive controller to transparently remap failed blocks. Which means that by the time the host actually starts seeing failures the disk is likely to be in a very bad state.

  21. Yes things are faster by Anonymous Coward · · Score: 0

    I used to be pleased years ago when rsync over ssh could saturate a 100 Mb/s LAN (e.g. around 12 MB/s of actual disk throughput).

    Yesterday I rsynced about 25 GB of files (Linux distro tree) onto my new laptop over gigabit LAN and sustained 35 MB/s the entire way. This is application speed, which means NICs and CPUs kept up with the I/O and encryption loads.

    This performance is about 75% of my ideal disk speed on my laptop, measured with just one large sequential access dumped to /dev/null.

  22. Re:Get on with it? Vista hides behind Mohave Proje by Ant+P. · · Score: 1

    Any idiot can receive mod points, but it takes a genius to figure out how to navigate this terrible new UI.

  23. Coddling in all computers with SDTV output by tepples · · Score: 1

    While it might be safer, I really don't want to be in a situation where my hardware is telling me I can't do something, because I might screw it up.

    Then why does almost every computer with a decent SDTV output have a lockout chip designed to prevent homemade programs from running? Examples include DVRs, video game consoles, and the like.

    1. Re:Coddling in all computers with SDTV output by SanityInAnarchy · · Score: 1

      I don't know. My Powerbook didn't -- it had svideo out, and an adapter cable to turn it into RCA.

      Now I've got a Dell notebook, which has an HDMI port. Again, nothing to prevent me from showing whatever I want on it.

      But think about the context here -- I don't think hardware restricting me is a good thing. I don't think DVRs or game consoles should prevent people from hacking on them.

      --
      Don't thank God, thank a doctor!
    2. Re:Coddling in all computers with SDTV output by tepples · · Score: 1

      Now I've got a Dell notebook, which has an HDMI port.

      Which isn't very helpful if you don't have $600 to spend on an HDTV.

      I don't think hardware restricting me is a good thing. I don't think DVRs or game consoles should prevent people from hacking on them.

      You don't think they should, yet all do. Why does this continue to be the case? Are the console makers afraid of the indie game dev scene?

    3. Re:Coddling in all computers with SDTV output by SanityInAnarchy · · Score: 1

      Which isn't very helpful if you don't have $600 to spend on an HDTV.

      I got a Dell monitor -- 24 inch, 1080p -- for $300 or so. It has all kinds of ports, including HDMI.

      It also has DVI, but I prefer the HDMI, mostly because there's no thumbscrews. Picture quality would be exactly the same, though.

      You don't think they should, yet all do. Why does this continue to be the case?

      I don't really know, but I sincerely doubt that they all come to me for advice.

      Are the console makers afraid of the indie game dev scene?

      That and piracy.

      See, right now, the platform is so absurdly restricted that the barrier to piracy is prohibitively high. On Windows, you just need to know how to use BitTorrent and Daemontools -- on a console, you often also need to know how to use a soldering iron.

      And no, it's not really about the indie dev scene -- more about the ability to circumvent the licensing. Consider that most consoles, especially early on, are sold at a loss. They then try to make their money back on licenses -- basically, if you want a game on the 360, you'll be paying Microsoft to do so.

      In fact, Microsoft, in particular, has made Xbox Live Arcade accessible enough to indie developers.

      --
      Don't thank God, thank a doctor!
  24. delicious by blessbon · · Score: 1

    Where do I get the fix...and how do I install it? Am using kernel 2.6.26 on openSuse 11?