RH7 Crashes In Three Weeks (But Fixed)
Herz writes: "I got this email today from Red Hat. RH7 will crash out of the box in 3 weeks! The new Update Agent provided with Red Hat Linux 7.0 contains a daemon, rhnsd, which periodically polls Red Hat Network for updates. This daemon leaks file descriptors. On a default installation, all available file descriptors will be used by rhnsd in approximately three weeks, making the system unusable." The Red Hat folks have also provided a fix, though -- updated packages for those who want to use their update network, and the two-line method of disabling per machine for those who don't. After all, everyone wants uptime > 3 weeks, eh? And you don't need to wait for a "service pack," either.
...the win95 "43 day" bug... where it would crash exactly after 43 days...
They never introduced a fix... the sheer idea of running win95 for 43 days was silly, even to MS.
BlackNova Traders
... soon it's unstable enough to take over the desktop market!
it's in my head
I think somethings nutty, my comment disappeared.
Anyway, my whole "-1, Flamebait" comment was:
Are you installing RH7 on production machines the day it comes out? Are you INSANE? Look, its a bug. They have a fix. So patch the TEST MACHINES you're running RH7 on, so you can work out the bugs, migration path, and eratta, and get on with your life! You ARE running this on test machines, right? You are planning a migration to RH7, not just popping the CD into your mission-critical servers, right? You are following good sysadmin practices, right?
Just because they rushed the release doesn't mean you have to take it. Take your time and be smart.
ZOMG I WOULD LOVE TO KNOW ABOUT YOUR FEELINGS ON MACINTOSH VERSUS WINDOWS, VI VERSUS EMACS, AND HOW YOU'RE NOT A DORK
Actually you bring up a valid question, with regards to slashdot anyways. If Win2K had this bug it would certainly been on slahsdot, and met with much approval. Many MS friendly posters will go on about how slashdot is biased and unfair towards MS, well, posting this story pretty much lets RH have the MS treatment. Seems fair enough to me.
Now with regards to the bug, I think the obvious fix is to simply kill -9 rhnsd. There ya go, bug fixed. Yes it's a serious bug, but it's hardly a service that any production server needs so it's a non-issue in my mind. If you are running a serious server you are probably not going to let the the software update itself. You are going to get it up, apply any security patches that come out, and lock it in a closet somewhere. The "idea" that you must be running the most current version of software is a marketing ploy (which MS does very well) and is hogwash. If you have software that meets your needs and is stable and secure you certainly don't want to screw it up by randomly updating it.
I think it was poor of RH not to actually test this properly, but I also understand that this is partly just the nature of the beast. They feel that they must move forward at a fast pace and this is the result.
It says
/sbin/service rhnsd stop
/sinb/chkconfig --level 345 rhnsd off .
/sbin/service rhnsd stop
/sbin/chkconfig --level 345 rhnsd off. .0 releases...
But of course it should be
This doesn't exactly help improving the impression of their
No, this is important to know.
/. readers. (obviously not all /. readers use linux, and not all linuxers use redhat, but the population is still going to be quite large.)
:-)
Redhat dominates the Linux market. This affects a LOT of
As well, I think politically it's probably a good idea to be public about this kind of bug. Linux has a rep of being extremely reliable. I, for one, would like to keep it that way, and bugs that affect reliability thus NEED TO BE very embarassing events. Trying to suppress this kind of news may make Linux APPEAR more reliable but actually BE less reliable -- a lose-lose situation for sure.
After all, if Sendmail suddenly started crashing every two weeks, the community would be justifiably furious about it. I don't think it's unreasonable to hold Redhat to a similar standard. They have an enormous advantage over Microsoft by packaging all the Open Source stuff instead of writing it themselves. Seems to me that expecting really good QA on their internally-written software is quite reasonable.
You can bet that if Microsoft had released Win2K with a bug that took it down after two weeks it would have made national news. And Slashdot.
Most of posters stating that they do actually use RH 7 seem quite happy about it, noticing that it is even more stable than RH 5.0 or 6.0 ever were. Most of the bad press on
So, chances are that you should trust /. a little less and learn from your own experience by trying it... In my experience, it is better than all previous RH releases; the way it should be.
GNU/Linux. The Freshmaker.
Okay, we all hate Microsoft, but come on. Cheap digs like "you don't have to wait for a service pack" will just turn people off. (Remember the first Gore vs. Bush debate?)
You can't do that standing on such shaky ground. One could argue that it _is_ a service pack, or point out that MS does usually release patches to serious problems within a week as well as rolling them up into a service pack.
They never introduced a fix... the sheer idea of running win95 for 43 days was silly, even to MS.
Why was that? I personally like to leave my computer on it's better for the electrical connections within the machine and parts due to thermal expansion/contraction.
Better for the "electrical connections within the machine"... Uhhh, okay.
Actually, it's just an expansion-contraction issue within the ICs, in particular. And the hard disk drive, landing the heads every time you shut down (but this is the same as if you leave the power management on). Cheap power supplies can sometimes make issues with voltage spikes as they turn on; if you buy a good one, the voltages all come up to their regulated levels and then the Power_Good line is pulled high and the motherboard is reset.
So, if you have a good quality system, you probably won't have any problems with the wear of turning your machine on and off in reasonable useage until after the machine is obsolete.
Compare this to the higher power bills, risks of fans dying and overheating that conservatively overclocked processor, as well as more potential uptime for a thunderstorm to kill it, and I feel it's probably wise to shut off the computer when you're not using it. Of course, that's discretion. Do you turn off the computer when you leave the office for lunch? Nah. For the weekend? For sure. Overnight? I do.
I do speak with some authority here; while I'm not an electrical engineer, I have several years of experience design engineering critical radar systems for Litton. I also used to write electronics design and construction columns for Popular Electronics magazine.
As for Windows 9x/ME, it's only under controlled laboratory conditions that you can make a Windows box run long enough to see that bug. I've managed to see the 49.7 day bug once; and with the M$ fix, I've seen a record uptime of 103 days with Windows 95B OSR2. Windows 3.1/DOS, I've managed to keep running for months at a time.
Fire and Meat. Yummy.
Although I'm not an advocate of any certain distro, I must say that I applaud RH the effort they have put into open source software. However, this problem shows one problem with open source: Quality control on open source software.
In an ideal situation, every programmer will look at the source code, and contribute to the effort of the open project. Most people (like myself) are free-riders, who have no ability to program. So as idealistically sound open source may seem, there are certain issues to worry about.
In RH's case, at least they pay their workers-which means that they are more willing to do the dirtywork of bug fixing others' code (in theory). Although, cases like this gives another doubt in the "Linux for the business" credibility since more non-techies seem to equate Linux with RedHat. It seems to be an understanding by almost everyone, that any RH x.0 distro is pretty much an experimental state, and must not be used on production servers. This, however, makes theo perating system appear "buggy" and "not production-quality" to the uninformed, hence I wish they will take more pride in their distribution instead of "hey, we had that packaged into ours first!" I honestly wish comments on how RH's similarity with MS due to their tactics are only on the surface. Unlike MS (whose operating system is proprietary), RH simply has their own distribution of an open-sourced OS. If you so choose not to use their distro, you have enough other choices: e.g. Debian, Mandrake, Slackware, etc etc.
The Win95 47 day bug was funny because the bug had been there a long time, and nobody had found it... implying that nobody had been able to keep a Win95 box up for 47 days.
RHL 7 has been out for two weeks. It's not even in _stores_ around here yet, but the bug has been found. It's been fixed.
That's why it's not a big deal.
"I will take the Ring," he said, "though I do not know the way."
Wait, a revolutionary moment!!! Slashdot confirms an article before posting it!!!
- I don't care if they globalize against free speech. All my best free thoughts are done in my head.
The leak is in The Update Manager. If you're not running the update manager, you don't have a problem and the system won't go down. If you ARE running the Update Manager - well, it'll just automatically get the update from RedHat, won't it? Assuming that part works, anyway...
Sounds like Red Hat is getting ready to takeover the desktop market. It now has the same functionality as Windows Me! :-)
My journal has hot
It's not a pretty sight. It's not too far off from running out of memory. And, the 4096 number is a system wide number:
Now, it's not that when that number runs out, that process dies, but the *NEXT* process to request a file dies. This happens on officially penguin-peed kernels as well. You need to set resource limits to keep an individual process from getting to trigger happy with files.
And by the way, take stock 2.2 and make a program which either A) fork bombs or B) chews memory. Watch the system go down in flames. In the case of (B) you (once? Is it fixed?) had the chance of watching the kernel give init the boot, which is very ugly.
--
Ben Kosse
--
Ben Kosse
Remember Ed Curry!
That's why you use cron instead of writing a long-running daemon.
The evaluation of an action as 'practical' . . . depends on what it is that one wishes to practice.
tarballs rule! They aren't a package, they are a state of mind.
Umm, I don't thinkt that was a bug with Windows NT 4.0 there buddy, I've run my servers and workstations for well over six months without reboots, the glitch was in Win95.
---
When in danger or in doubt, run in circles, scream and shout. --Robert A. Heinlein
And I'm very glad to know about the bug and the fix; it's something of a showstopper, and I didn't know the update manager was active by default, so this is valuable information -- not RedHat bashing.
-Erf C.
-Erf C.
Cthulu always calls collect...
The 49.7 day bug was not in NT - it was in Windows 95. We have several NT boxes at work that have not been rebooted for months and months. I still like Linux servers better but for a workstation, I still prefer NT and there sure as hell is no 49.7 day bug in NT.
Woke up this morning
Crawled out of bed
Couldn't wait to get that Red Hat distro you said
Told you to worry
Told you to wait
But no you want to mirror it from outside the state
Refrain
I got the blues
Got them old dot zero blues
Cause I done installed that distro
And it blew up on my shoes
Wish I had DSL
Wish I had fat pipes
But on a 56K modem
The download's such a fright
It's all installed now
Servers up and cool
But I come back three weeks later
And look just like a fool
Refrain
Got burned by Compaq
Got burned by Dell
Got burned by Microsoft
Now I'm in Red Hat dot zero hell
Refrain
Now don't you worry
This one's ok
It won't drop under loads now
Cause if it does we'll make you pay!
Refrain
--- Will in Seattle - What are you doing to fight the War?