Slashdot Mirror


RH7 Crashes In Three Weeks (But Fixed)

Herz writes: "I got this email today from Red Hat. RH7 will crash out of the box in 3 weeks! The new Update Agent provided with Red Hat Linux 7.0 contains a daemon, rhnsd, which periodically polls Red Hat Network for updates. This daemon leaks file descriptors. On a default installation, all available file descriptors will be used by rhnsd in approximately three weeks, making the system unusable." The Red Hat folks have also provided a fix, though -- updated packages for those who want to use their update network, and the two-line method of disabling per machine for those who don't. After all, everyone wants uptime > 3 weeks, eh? And you don't need to wait for a "service pack," either.

25 of 301 comments (clear)

  1. Kind of like... by Ron+Harwood · · Score: 5

    ...the win95 "43 day" bug... where it would crash exactly after 43 days...

    They never introduced a fix... the sheer idea of running win95 for 43 days was silly, even to MS.

    1. Re:Kind of like... by v4mpyr · · Score: 3

      That's on a Unix machine running VMWare, right? ;-)

      That was supposed to be funny, laugh dammit.

      --

    2. Re:Kind of like... by andyh1978 · · Score: 4
      I never heard of that bug. Are you sure that wasn't FUD?
      No, it was a real bug.
      And it was 49.7 days (the time it takes for a millisecond timer to overflow a 32bit unsigned integer.
      It was fixed in one of the service packs.

      See this MS KB entry for details.
  2. Linux is targetting Windows by Troed · · Score: 4

    ... soon it's unstable enough to take over the desktop market!

  3. /. ate my comment! by banky · · Score: 5

    I think somethings nutty, my comment disappeared.

    Anyway, my whole "-1, Flamebait" comment was:

    Are you installing RH7 on production machines the day it comes out? Are you INSANE? Look, its a bug. They have a fix. So patch the TEST MACHINES you're running RH7 on, so you can work out the bugs, migration path, and eratta, and get on with your life! You ARE running this on test machines, right? You are planning a migration to RH7, not just popping the CD into your mission-critical servers, right? You are following good sysadmin practices, right?

    Just because they rushed the release doesn't mean you have to take it. Take your time and be smart.

    --
    ZOMG I WOULD LOVE TO KNOW ABOUT YOUR FEELINGS ON MACINTOSH VERSUS WINDOWS, VI VERSUS EMACS, AND HOW YOU'RE NOT A DORK
  4. Re:Politics by Zoltar · · Score: 5

    Actually you bring up a valid question, with regards to slashdot anyways. If Win2K had this bug it would certainly been on slahsdot, and met with much approval. Many MS friendly posters will go on about how slashdot is biased and unfair towards MS, well, posting this story pretty much lets RH have the MS treatment. Seems fair enough to me.

    Now with regards to the bug, I think the obvious fix is to simply kill -9 rhnsd. There ya go, bug fixed. Yes it's a serious bug, but it's hardly a service that any production server needs so it's a non-issue in my mind. If you are running a serious server you are probably not going to let the the software update itself. You are going to get it up, apply any security patches that come out, and lock it in a closet somewhere. The "idea" that you must be running the most current version of software is a marketing ploy (which MS does very well) and is hogwash. If you have software that meets your needs and is stable and secure you certainly don't want to screw it up by randomly updating it.

    I think it was poor of RH not to actually test this properly, but I also understand that this is partly just the nature of the beast. They feel that they must move forward at a fast pace and this is the result.

  5. Their "quick fix" also has a bug :) by Telcontar · · Score: 3

    It says
    /sbin/service rhnsd stop
    /sinb/chkconfig --level 345 rhnsd off
    .
    But of course it should be
    /sbin/service rhnsd stop
    /sbin/chkconfig --level 345 rhnsd off
    .
    This doesn't exactly help improving the impression of their .0 releases...

  6. Re:Politics by Malor · · Score: 5

    No, this is important to know.

    Redhat dominates the Linux market. This affects a LOT of /. readers. (obviously not all /. readers use linux, and not all linuxers use redhat, but the population is still going to be quite large.)

    As well, I think politically it's probably a good idea to be public about this kind of bug. Linux has a rep of being extremely reliable. I, for one, would like to keep it that way, and bugs that affect reliability thus NEED TO BE very embarassing events. Trying to suppress this kind of news may make Linux APPEAR more reliable but actually BE less reliable -- a lose-lose situation for sure.

    After all, if Sendmail suddenly started crashing every two weeks, the community would be justifiably furious about it. I don't think it's unreasonable to hold Redhat to a similar standard. They have an enormous advantage over Microsoft by packaging all the Open Source stuff instead of writing it themselves. Seems to me that expecting really good QA on their internally-written software is quite reasonable.

    You can bet that if Microsoft had released Win2K with a bug that took it down after two weeks it would have made national news. And Slashdot. :-)

  7. Re:Serious teething pains by Menthos · · Score: 5
    If you've been following all Red Hat stories lately and read most comments you'd notice that the most people complaining about RH 7 are the people that don't actually run it.
    Most of posters stating that they do actually use RH 7 seem quite happy about it, noticing that it is even more stable than RH 5.0 or 6.0 ever were. Most of the bad press on /. was indeed very bad journalism, even FUD in the case of the "2500 bugs" story, which wasn't even close to the truth (the real figure of unsolved bugs, feature requests and other issues in RH7 was 150, yes one hundred and fifty, not 2500). The idiot poster who submitted that story counted not the outstanding bugs in RH7 as he was claiming but all entries in Bugzilla for all previous RH releases, including feature requests, resolved bugs, duplicates, non-reproducable errors, bug reports missing critical information and otherwise closed "bugs"...

    So, chances are that you should trust /. a little less and learn from your own experience by trying it... In my experience, it is better than all previous RH releases; the way it should be.

    --

    GNU/Linux. The Freshmaker.

  8. Childish by drfrank · · Score: 4

    Okay, we all hate Microsoft, but come on. Cheap digs like "you don't have to wait for a service pack" will just turn people off. (Remember the first Gore vs. Bush debate?)

    You can't do that standing on such shaky ground. One could argue that it _is_ a service pack, or point out that MS does usually release patches to serious problems within a week as well as rolling them up into a service pack.

  9. Re:Why is that? by BigBlockMopar · · Score: 4

    They never introduced a fix... the sheer idea of running win95 for 43 days was silly, even to MS.

    Why was that? I personally like to leave my computer on it's better for the electrical connections within the machine and parts due to thermal expansion/contraction.

    Better for the "electrical connections within the machine"... Uhhh, okay.

    Actually, it's just an expansion-contraction issue within the ICs, in particular. And the hard disk drive, landing the heads every time you shut down (but this is the same as if you leave the power management on). Cheap power supplies can sometimes make issues with voltage spikes as they turn on; if you buy a good one, the voltages all come up to their regulated levels and then the Power_Good line is pulled high and the motherboard is reset.

    So, if you have a good quality system, you probably won't have any problems with the wear of turning your machine on and off in reasonable useage until after the machine is obsolete.

    Compare this to the higher power bills, risks of fans dying and overheating that conservatively overclocked processor, as well as more potential uptime for a thunderstorm to kill it, and I feel it's probably wise to shut off the computer when you're not using it. Of course, that's discretion. Do you turn off the computer when you leave the office for lunch? Nah. For the weekend? For sure. Overnight? I do.

    I do speak with some authority here; while I'm not an electrical engineer, I have several years of experience design engineering critical radar systems for Litton. I also used to write electronics design and construction columns for Popular Electronics magazine.

    As for Windows 9x/ME, it's only under controlled laboratory conditions that you can make a Windows box run long enough to see that bug. I've managed to see the 49.7 day bug once; and with the M$ fix, I've seen a record uptime of 103 days with Windows 95B OSR2. Windows 3.1/DOS, I've managed to keep running for months at a time.

    --
    Fire and Meat. Yummy.
  10. ouch, this has to hurt.... by somen · · Score: 3

    Although I'm not an advocate of any certain distro, I must say that I applaud RH the effort they have put into open source software. However, this problem shows one problem with open source: Quality control on open source software.

    In an ideal situation, every programmer will look at the source code, and contribute to the effort of the open project. Most people (like myself) are free-riders, who have no ability to program. So as idealistically sound open source may seem, there are certain issues to worry about.

    In RH's case, at least they pay their workers-which means that they are more willing to do the dirtywork of bug fixing others' code (in theory). Although, cases like this gives another doubt in the "Linux for the business" credibility since more non-techies seem to equate Linux with RedHat. It seems to be an understanding by almost everyone, that any RH x.0 distro is pretty much an experimental state, and must not be used on production servers. This, however, makes theo perating system appear "buggy" and "not production-quality" to the uninformed, hence I wish they will take more pride in their distribution instead of "hey, we had that packaged into ours first!" I honestly wish comments on how RH's similarity with MS due to their tactics are only on the surface. Unlike MS (whose operating system is proprietary), RH simply has their own distribution of an open-sourced OS. If you so choose not to use their distro, you have enough other choices: e.g. Debian, Mandrake, Slackware, etc etc.

  11. Re:Biased pinhead... by Alan+Shutko · · Score: 3

    The Win95 47 day bug was funny because the bug had been there a long time, and nobody had found it... implying that nobody had been able to keep a Win95 box up for 47 days.

    RHL 7 has been out for two weeks. It's not even in _stores_ around here yet, but the bug has been found. It's been fixed.

    That's why it's not a big deal.

  12. Main page comings and goings by pq · · Score: 3
    Please, people, if you pull a story off the main page and then restore it, add an Update: line so that I don't get this feeling that I'm slowly losing my mind. I didn't dream it all, did I? This was on the main page, pulled off around comment #30, and restored around #50... what's going on? Please?

    --
    "I will take the Ring," he said, "though I do not know the way."
  13. Disappearing article? by Fervent · · Score: 4
    Did anyone notice that this article disappeared for about an hour today? Was there some complaints/questions to its authenticity?

    Wait, a revolutionary moment!!! Slashdot confirms an article before posting it!!!

    --

    - I don't care if they globalize against free speech. All my best free thoughts are done in my head.

    1. Re:Disappearing article? by 11223 · · Score: 5

      Naah, they just ran out of file descriptors and had to remove the story from the front page list :-P

  14. A little perspective by the_quark · · Score: 5
    As funny as this is, because of exactly what the problem is, it's not going to be an issue:

    The leak is in The Update Manager. If you're not running the update manager, you don't have a problem and the system won't go down. If you ARE running the Update Manager - well, it'll just automatically get the update from RedHat, won't it? Assuming that part works, anyway...

  15. Completely unusuable in 3 weeks? by Surak · · Score: 3

    Sounds like Red Hat is getting ready to takeover the desktop market. It now has the same functionality as Windows Me! :-)

  16. Have you ever run out of file descriptors? by bkosse · · Score: 3

    It's not a pretty sight. It's not too far off from running out of memory. And, the 4096 number is a system wide number:

    file-nr and file-max
    The kernel allocates file handles dynamically, but as yet doesn't free them again.

    The value in file-max denotes the maximum number of file handles that the Linux kernel will allocate. When you get a lot of error messages about running out of file handles, you might want to raise this limit. The default value is 4096. To change it, just write the new number into the file:

    Now, it's not that when that number runs out, that process dies, but the *NEXT* process to request a file dies. This happens on officially penguin-peed kernels as well. You need to set resource limits to keep an individual process from getting to trigger happy with files.

    And by the way, take stock 2.2 and make a program which either A) fork bombs or B) chews memory. Watch the system go down in flames. In the case of (B) you (once? Is it fixed?) had the chance of watching the kernel give init the boot, which is very ugly.

    --
    Ben Kosse

    --

    --
    Ben Kosse
    Remember Ed Curry!
  17. one word: cron by The+Pim · · Score: 5

    That's why you use cron instead of writing a long-running daemon.

    --

    The evaluation of an action as 'practical' . . . depends on what it is that one wishes to practice.
  18. You lack slack, Jack! by dattaway · · Score: 3

    tarballs rule! They aren't a package, they are a state of mind.

  19. Re:Proof positive of the benefits of Open Source by Sethb · · Score: 3

    Umm, I don't thinkt that was a bug with Windows NT 4.0 there buddy, I've run my servers and workstations for well over six months without reboots, the glitch was in Win95.
    ---

    --
    When in danger or in doubt, run in circles, scream and shout. --Robert A. Heinlein
  20. Re:Politics by ErfC · · Score: 3
    I don't see this as FUD, or derogatory, and I don't see how politics should be involved. As is pointed out, the fix is easy (either update the package, or turn off the daemon, or both), so we don't have to wait for a service pack or anything.

    And I'm very glad to know about the bug and the fix; it's something of a showstopper, and I didn't know the update manager was active by default, so this is valuable information -- not RedHat bashing.

    -Erf C.

    --

    -Erf C.
    Cthulu always calls collect...

  21. Re:Proof positive of the benefits of Open Source by macpeep · · Score: 4

    The 49.7 day bug was not in NT - it was in Windows 95. We have several NT boxes at work that have not been rebooted for months and months. I still like Linux servers better but for a workstation, I still prefer NT and there sure as hell is no 49.7 day bug in NT.

  22. Got Them Dot Zero Blues by WillSeattle · · Score: 5

    Woke up this morning
    Crawled out of bed
    Couldn't wait to get that Red Hat distro you said

    Told you to worry
    Told you to wait
    But no you want to mirror it from outside the state

    Refrain
    I got the blues
    Got them old dot zero blues
    Cause I done installed that distro
    And it blew up on my shoes

    Wish I had DSL
    Wish I had fat pipes
    But on a 56K modem
    The download's such a fright

    It's all installed now
    Servers up and cool
    But I come back three weeks later
    And look just like a fool

    Refrain

    Got burned by Compaq
    Got burned by Dell
    Got burned by Microsoft
    Now I'm in Red Hat dot zero hell

    Refrain

    Now don't you worry
    This one's ok
    It won't drop under loads now
    Cause if it does we'll make you pay!

    Refrain

    --
    --- Will in Seattle - What are you doing to fight the War?