RH7 Crashes In Three Weeks (But Fixed)
Herz writes: "I got this email today from Red Hat. RH7 will crash out of the box in 3 weeks! The new Update Agent provided with Red Hat Linux 7.0 contains a daemon, rhnsd, which periodically polls Red Hat Network for updates. This daemon leaks file descriptors. On a default installation, all available file descriptors will be used by rhnsd in approximately three weeks, making the system unusable." The Red Hat folks have also provided a fix, though -- updated packages for those who want to use their update network, and the two-line method of disabling per machine for those who don't. After all, everyone wants uptime > 3 weeks, eh? And you don't need to wait for a "service pack," either.
Actually, that's not the problem. The problem is that probably none of the beta testers would have bothered to leave this particular service enabled since there wouldn't BE any updates to check for prior to release. Sure it's an oversight, but it's not like it reformats your hard drive or allows doubleclick.net to view your persiankitty.com cookies or opens your box to a root exploit.
I do not have a signature
Like the other poster said, it was real. It was pretty much a 'don't care' bug though -- whoever heard of a 98 box staying up that long anyway?
:-)
I'm lucky to get 48 hours, much less 48 days!
Red Hat 7.0 - $29.95
CD/RW burner - $229.50
10 pack of CDs - $49.95
Look on luser's face when the server drops - Priceless
--- Will in Seattle - What are you doing to fight the War?
... have probably already figured this out. I kept seeing bizzaro stuff in my log files from rhnsd. After looking up /etc/init.d/rhnsd I saw that it was not something I needed (I always download for free, so I doubt they are going to be giving me any service ).
At least it was putting nice messages into the log file.
For those who need it:
chkconfig --level 345 rhnsd off (turns off the startup)
"Doubt your doubts and believe your beliefs." -- Switchfoot, Ode to Chin
Bleeding edge isn't what Debian stable is about but that is definitely what Debian unstable is for.
You asked for reccommendations... There's mine
...the win95 "43 day" bug... where it would crash exactly after 43 days...
They never introduced a fix... the sheer idea of running win95 for 43 days was silly, even to MS.
BlackNova Traders
I'm running X 4.0.1 from Rawhide on my 6.2 box.
You'll have to update a few other packages to get it to install cleanly (initscripts, among others), but it can be done.
BTW, you have to be willing to recompile from SRPMs - precompiled RPMs won't work. But here's how you do it:
Recompile the X RPMs.
Try to install them, find out what needs to be updated.
Get those packages, rebuild them from their SRPMs and install.
After that, the hardest thing is updating your XF86Config file...
retrorocket.o not found, launch anyway?
You are right, people reading slashdot generally like anything non-M$ over Micro$oft products.
:) but it certainly is covered.
/. next time.
Has been like that since Slashdot started. However, what you are saying about hotmail switching to Win2K has been covered here. Again, it may be biased (take the title of that article, for example
So please check first before making statements about
Every expression is true, for a given value of 'true'
They never introduced a fix... the sheer idea of running win95 for 43 days was silly, even to MS.
Why was that? I personally like to leave my computer on it's better for the electrical connections within the machine and parts due to thermal expansion/contraction.
Respond to s
... soon it's unstable enough to take over the desktop market!
it's in my head
Depending on your device drivers and possibly applications. I've had NT workstations (4.0, SP4 or higher) go over 49.7 days several times (the key is to not actually use it :-) and while they continue to run, they start acting totally wierd in some ways. Mostly in the GUI, AFAIcouldT, but I didn't wait around for something bigger to show up. All in all it handled it better than the Linux 2.0 workstation across the room I eventually rolled over a couple years ago.
Of course, almost all NT stability depends on your device drivers, and not knowing that is the #1 cause of unstable NT installs done by non-pros.
According to this comment, "the leak is in the rhnsd daemon which is installed and running by default after installation. Even people who never start the update agent will get bitten by this, unless they disabled the daemon after installation."
"It would appear to an outside observer who might read /. for the first time that RH is junk."
Whoops, sorry, outside observers. Rob, please change the headline to read "Another RedHat Feature Discovered".
If I were running on OS that came with a incompatible (and buggy to boot) compiler, a 3 week uptime limit and countless other "issues" I would call it junk. If RedHat is distributing a version of Linux with these problem, then RedHat Linux is junk. Forget what it looks like to "outside observers"--that's just propaganda. Many of us chose Linux because of it's reputation for technical excellence--if RedHat can't stand the heat, they need to leave the kitchen.
--
An abstained vote is a vote for Bush and Gore.
Non-meta-modded "Overrated" mods are killing Slashdot
(Hey Ryan! Here's your proof!)
Presumably cron has addressed all the issues involved in running forever. That is why The Pim recommends it. He wasn't implying that cron wasn't a long running-daemon. Solving these issues again is re-inventing the wheel, and, in this case, re-inventing the square wheel.
There is no distinction between 'official' and 'unofficial' ISO images. Its all the same ISO. And the daemon doesn't do anything unless you tell it to (but it is running).
The easyest fix is to just run up2date, and update the 'up2date' package, which owns the daemon.
-- Crutcher --
#include <disclaimer.h>
-- Crutcher --
#include <disclaimer.h>
I think somethings nutty, my comment disappeared.
Anyway, my whole "-1, Flamebait" comment was:
Are you installing RH7 on production machines the day it comes out? Are you INSANE? Look, its a bug. They have a fix. So patch the TEST MACHINES you're running RH7 on, so you can work out the bugs, migration path, and eratta, and get on with your life! You ARE running this on test machines, right? You are planning a migration to RH7, not just popping the CD into your mission-critical servers, right? You are following good sysadmin practices, right?
Just because they rushed the release doesn't mean you have to take it. Take your time and be smart.
ZOMG I WOULD LOVE TO KNOW ABOUT YOUR FEELINGS ON MACINTOSH VERSUS WINDOWS, VI VERSUS EMACS, AND HOW YOU'RE NOT A DORK
Why can't you sync the disks? All you need to do is to kill the redhat daemon and you get all of your file descriptors back, then just run like normal. The kernel will clean up after the application when it exits.
--
Mike Mangino
Sr. Software Engineer, SubmitOrder.com
Mike Mangino
mmangino@acm.org
Actually you bring up a valid question, with regards to slashdot anyways. If Win2K had this bug it would certainly been on slahsdot, and met with much approval. Many MS friendly posters will go on about how slashdot is biased and unfair towards MS, well, posting this story pretty much lets RH have the MS treatment. Seems fair enough to me.
Now with regards to the bug, I think the obvious fix is to simply kill -9 rhnsd. There ya go, bug fixed. Yes it's a serious bug, but it's hardly a service that any production server needs so it's a non-issue in my mind. If you are running a serious server you are probably not going to let the the software update itself. You are going to get it up, apply any security patches that come out, and lock it in a closet somewhere. The "idea" that you must be running the most current version of software is a marketing ploy (which MS does very well) and is hogwash. If you have software that meets your needs and is stable and secure you certainly don't want to screw it up by randomly updating it.
I think it was poor of RH not to actually test this properly, but I also understand that this is partly just the nature of the beast. They feel that they must move forward at a fast pace and this is the result.
It says
/sbin/service rhnsd stop
/sinb/chkconfig --level 345 rhnsd off .
/sbin/service rhnsd stop
/sbin/chkconfig --level 345 rhnsd off. .0 releases...
But of course it should be
This doesn't exactly help improving the impression of their
No, this is important to know.
/. readers. (obviously not all /. readers use linux, and not all linuxers use redhat, but the population is still going to be quite large.)
:-)
Redhat dominates the Linux market. This affects a LOT of
As well, I think politically it's probably a good idea to be public about this kind of bug. Linux has a rep of being extremely reliable. I, for one, would like to keep it that way, and bugs that affect reliability thus NEED TO BE very embarassing events. Trying to suppress this kind of news may make Linux APPEAR more reliable but actually BE less reliable -- a lose-lose situation for sure.
After all, if Sendmail suddenly started crashing every two weeks, the community would be justifiably furious about it. I don't think it's unreasonable to hold Redhat to a similar standard. They have an enormous advantage over Microsoft by packaging all the Open Source stuff instead of writing it themselves. Seems to me that expecting really good QA on their internally-written software is quite reasonable.
You can bet that if Microsoft had released Win2K with a bug that took it down after two weeks it would have made national news. And Slashdot.
>Instead I get a bunch of CDs that are now
>useless.
By that definition of useless, EVERY data CD is useless. There is no such thing as a bug-free release of any piece of software.
>Oh I guess I could install RH 7.0 and then
>download a million patches.
Oh you poor thing. You have to type 'up2date' at the console.
>Service packs are a great idea because you can
>consolidate all of the fixes into a comprehensive
>unit and thus you can tell people, my software
>will work on Redhat 7.0 service pack 3
I have to agree with you on this one. The concept of a service pack or a patch bundle is usefull at times.
However, patches SHOULD be made available as soon as there is one, and should continue to be available individually.
I don't know how many times during my stint as a support person I ran into a service pack or patch bundle that broke other things that were working fine.
Matt
Common politics would dictate waiting for the bug story to cool down before stroking the still-burning embers.
- I don't care if they globalize against free speech. All my best free thoughts are done in my head.
I'm running Slack on my desktop...
<O
( \
XPlay Tetris On Drugs!
Will I retire or break 10K?
Therefore a normal application cannot use up all file descriptors. Probably however the update agent runs with super-user privileges ( I don't know for sure: does it also automatically update packages?)
I see this bug as a result of a worrying tendence of open-source software to copy M$oft software in giving too much control to the computer and too few control to the user (outlook viruses, anyone?)
In these matters my motto is : the dumbest of users is still more intelligent than the smartest of computers.
Ciao
----
FB
Slack doesn't seem to have this problem
"I would kill everyone in this room for a drop of sweet beer."
Is it me, or is Red Hat the only distribution that /. ever posts bug reports on?
-- "Perceptions create reality. By changing your perceptions you change your reality."
No. Because Debian unstable/frozen gets tested by such a lot of people that a Debian-Crashes-In-Three-Weeks problem would get fixed way before the actual release.
Not saying Debian is perfect, just that that particular problem would be virtually impossible.
perl -e 'fork||print for split//,"hahahaha"'
Well, I'm a Red Hat user of old, and quite comfortable with the general quality and support provided.
However, I've abstained from buying RH 7, due to the massive problems they seem to have with this release. Far more than I remember in the 5.0 release and 6.0 release.
I'm using Debian at work, and becoming more and more enamoured of it's stability and ease of upgrade.
I was under the impression that the RawHide system of pre-release was meant to cure this kind of screwup.. This also dents my faith in that preconception.
The errors in the update agent are unforgivable though. With any release that's as shaky as a x.0 release from RH, they at least need update stable.
C'mon RH. Get your act together before you really lose your credibility.
Malk.
Okay, we all hate Microsoft, but come on. Cheap digs like "you don't have to wait for a service pack" will just turn people off. (Remember the first Gore vs. Bush debate?)
You can't do that standing on such shaky ground. One could argue that it _is_ a service pack, or point out that MS does usually release patches to serious problems within a week as well as rolling them up into a service pack.
I don't remember whether it was 43 days or not, but yes, there was a Windows 95 bug that was like this. (It was above 30 days as well.) I ran into it. (Yes, I ran Windows 95 for more than 30 days. No, the average user can't keep their system clean enough to do it for the most part. Yes, I did. Yes, I still think Windows 95 is a world better than 3.1.)
As for a memory leak, it's one of the most common errors you can have. 3 weeks is still a pretty good time frame; the fix was out very quickly; it was made public, the how and why of it. These are things you won't see with closed source companies. Bash RedHat all you want, truth is their internal programs just simply don't get the exposure the rest of Linux per se does, so some bugs slip by.
-- Talonius
My reality check bounced.
---
Actually, it's a reality because on production machines, you don't leave ANYTHING to chance. EVER. PERIOD. END OF STORY. Much like one of the main points of OSS is that you don't trust closed source, when deploying, you DON'T TRUST SOMETHING THAT HASN'T PASSED YOUR OWN TEST ENVIRONMENT.
Vintage computer games and RPG books available. Email me if you're interested.
Forgive me if I'm being snippy, but why is this a major issue? Yes, we've talked about problems with Rh 7.0. Yes, we've bitched about the new GCC shipping about it. But what is this, open season on RH? Since they are well known and popular, did they suddenly become evil that we have to slam on them all the time? It would appear to an outside observer who might read /. for the first time that RH is junk. And who knows how many people might have gotten that impression and decided not to switch to linux from NT.
It's just a good idea to flush out the system now and again...
BlackNova Traders
i can barely imagine anything i want my systems doing less than automatically looking for new software and/or installing updates without my fully conscious awareness of same and active involvement. do people actually find value in this type of service?
I do not have a signature
perl -e 'fork||print for split//,"hahahaha"'
I'm not complaining that there are bugs in RH7 - I know it's new, and I'm the first person to tell clients not to put new software on production servers. I made a considered choice to do this on my own server, because the hardware needed upgrading anyway, and the RH 5.2 which has been running flawlessly on it for the past couple of years was missing some stuff that I needed.
My problem is with the nature of this RH issue: it's a bug in a piece of software RH developed internally, and install by default without any indication or choice. I find that kind of "thinking for the customer" undesirable and unacceptable, and as I said, Microsoft-like. No doubt it's a reflection of Red Hat's post-IPO mass consumer focus; unfortunately that doesn't suit me very well.
As for checking the services, I did the install over the weekend, looked at the long list of services (since I installed a bunch of database and other server stuff) and decided to check it out later. I would have found rhnsd soon enough. Security isn't much of an issue because the box sits behind a firewall at the colo site with only web, imap and ssh ports open; the web server is my own build of Apache 2.0 alpha, for development purposes only.
This is a question posed by a Linux-wannabe who really knows nothing:
Does Linux have a max # of file handles, after which new handles cannot be created?
Let me pose this another way -- Can I crash a Linux box by opening a whole lot of files? Or is this daemon run as root? Then the new question is why is a daemon that has the capability to automatically update critical software, running as root? Surely it could be spoofed to update a system with poor DLLs?
To a Linux newby, this whole article sounds very scary.
This same problem in Debian wouldn't be posted here in 20 years. Unless you think Debian doesn't have any bugs...
That is because apt-get's functionality has been thoroughly tested for quite some time.
It is actually kinda nice to see other distributions catching up.
Of course, auto-update will be pretty broken with the care that goes into packaging RH RPMS. Have you ever tried to upgrade a RH distribution manually ? It is a broken mess of irrelevant and missed dependencies. Debian does this seamlessly.
What RH really needs is a thorough packaging policy, like this and this. Only with a thorough packaging policy can upgrades and auto-upgrades be useful.
Mainly, I hate using rpm --nodeps --force. On my debian system I never need those --nodeps options. Wonder why ???
Although I'm not an advocate of any certain distro, I must say that I applaud RH the effort they have put into open source software. However, this problem shows one problem with open source: Quality control on open source software.
In an ideal situation, every programmer will look at the source code, and contribute to the effort of the open project. Most people (like myself) are free-riders, who have no ability to program. So as idealistically sound open source may seem, there are certain issues to worry about.
In RH's case, at least they pay their workers-which means that they are more willing to do the dirtywork of bug fixing others' code (in theory). Although, cases like this gives another doubt in the "Linux for the business" credibility since more non-techies seem to equate Linux with RedHat. It seems to be an understanding by almost everyone, that any RH x.0 distro is pretty much an experimental state, and must not be used on production servers. This, however, makes theo perating system appear "buggy" and "not production-quality" to the uninformed, hence I wish they will take more pride in their distribution instead of "hey, we had that packaged into ours first!" I honestly wish comments on how RH's similarity with MS due to their tactics are only on the surface. Unlike MS (whose operating system is proprietary), RH simply has their own distribution of an open-sourced OS. If you so choose not to use their distro, you have enough other choices: e.g. Debian, Mandrake, Slackware, etc etc.
Mac OS X's kernel (Darwin) is not your typical monolithic BSD kernel. It's a Mach kernel with a layer of BSD-like services around that. Darwin is Nearly-Free Software under the Apple Public Source License.
<O
( \
XPlay Tetris On Drugs!
Will I retire or break 10K?
On XFree86 4.0.1, with a Hauppage "WinTV Go" card.
Watching the 2nd US Presidential debate start now, in fact.
Email me for a copy of my conf.modules (which may not be helpful if you're using a non bttv card) or XF86Config files.
Just a thought.
I'd rather have someone respond than be modded up.
Don't make excuses. This is exactly the same type of crap that MicroSoft dishes up, and RedHat is guilty of delivering it.
int main() {
printf("Hello, World!\n");
}
You are right, it is possible to write a small program without any bugs and - wait, sorry, I forgot to make it return an exit code. Let me get back to you...
Personally, I frequently use Red Hat & W2K to do my job, and am quite pleased with both. As I've been watching, I've seen you go hog wild over the Windows 47 day bug, but yet when RH has a 3 week one, it must not be a big deal... Hello, THIS IS A SERVER-CLASS OS. IT IS A BIG DEAL.
"I will take the Ring," he said, "though I do not know the way."
Wait, a revolutionary moment!!! Slashdot confirms an article before posting it!!!
- I don't care if they globalize against free speech. All my best free thoughts are done in my head.
The leak is in The Update Manager. If you're not running the update manager, you don't have a problem and the system won't go down. If you ARE running the Update Manager - well, it'll just automatically get the update from RedHat, won't it? Assuming that part works, anyway...
[using rpm --nodeps]
Huh. I never need them on my Red Hat system either. Wonder why???
Tell you what. Install redhat 4.2. Then upgrade one rpm command at a time to redhat 5.0, 5.1, 5.2, 6.0, 6.2, and then 7.0. And see how many times you need to use the --nodeps option.
The incidence is dramatically lower for debian debs. It is not the deb format. Rpm has all the same capabilities. It is the care that goes into packaging, highlighted by the packaging guides. Try to find something more comprehensive at the web site of a linux distribution.
There ought to be limits to freedom. - GWB
Sounds like Red Hat is getting ready to takeover the desktop market. It now has the same functionality as Windows Me! :-)
My journal has hot
It's not a pretty sight. It's not too far off from running out of memory. And, the 4096 number is a system wide number:
Now, it's not that when that number runs out, that process dies, but the *NEXT* process to request a file dies. This happens on officially penguin-peed kernels as well. You need to set resource limits to keep an individual process from getting to trigger happy with files.
And by the way, take stock 2.2 and make a program which either A) fork bombs or B) chews memory. Watch the system go down in flames. In the case of (B) you (once? Is it fixed?) had the chance of watching the kernel give init the boot, which is very ugly.
--
Ben Kosse
--
Ben Kosse
Remember Ed Curry!
Did you even read the MSDN article to which you linked? If this bug was fixed in Windows 95, why would they offer a downloadable patch for Windows 98??
- --------------------------------
Computer Hangs After 49.7 Days
-----------------------------------------------
The information in this article applies to:
Microsoft Windows 95
Microsoft Windows 95 OEM Service Release versions 2, 2.1, 2.5
Microsoft Windows 98
cpeterso
That's why you use cron instead of writing a long-running daemon.
The evaluation of an action as 'practical' . . . depends on what it is that one wishes to practice.
What, exactly is the compiler not compatible with? I give it C++ source code and it compiles it for me.
It generates perfectly ISO compatible code. It's not RedHat's fault the ISO spec is vague and underdefined. Expecting different versions of a C++ compiler (or different C++ compilers for that matter) to emit compatible code is a blatant misfeature.
I disabled the rhnsd about 15 minutes after the install. I suspect alot of others did as well due to privacy questions, etc... Didn't you guys turn it off as well?
Why anyone would want their system to "auto-update" is beyond me. I think you're just asking for trouble if you do that.
Did M$ buy some stock in RedHat? Seems like all these bugs and errata stem from a basic case of the dumbass, joined together with some deadlines from the marketing droids... geezz!
tarballs rule! They aren't a package, they are a state of mind.
Umm, I don't thinkt that was a bug with Windows NT 4.0 there buddy, I've run my servers and workstations for well over six months without reboots, the glitch was in Win95.
---
When in danger or in doubt, run in circles, scream and shout. --Robert A. Heinlein
It does require registration, though there is an 'anonymous' registration option, that sends only your hardware archetecture (so that the right rpms get sent) and an email address. It is one of the free levels of service. (of which there are several)
-- Crutcher --
#include <disclaimer.h>
-- Crutcher --
#include <disclaimer.h>
Windows NT 4.0 release: 1996
49.7 day bug discovered: 1999
Fix released: never
Well, it was Win 95 and 98, not NT. And it was fixed. click
You know that if this was about win2k instead of redhat there would be 500 posts saying "linux r00lz MS suckz0rs!!". The amount of bias that goes on here is incredible. Somehow taco missed the story about hotmail switching over to win2k. Thats a pretty major story, but since its pro MS it was quietly ignored.
Only the State obtains its revenue by coercion. - Murray Rothbard
The 49.7 day bug was not in NT - it was in Windows 95. We have several NT boxes at work that have not been rebooted for months and months. I still like Linux servers better but for a workstation, I still prefer NT and there sure as hell is no 49.7 day bug in NT.
Vintage computer games and RPG books available. Email me if you're interested.
Comment removed based on user account deletion
Woke up this morning
Crawled out of bed
Couldn't wait to get that Red Hat distro you said
Told you to worry
Told you to wait
But no you want to mirror it from outside the state
Refrain
I got the blues
Got them old dot zero blues
Cause I done installed that distro
And it blew up on my shoes
Wish I had DSL
Wish I had fat pipes
But on a 56K modem
The download's such a fright
It's all installed now
Servers up and cool
But I come back three weeks later
And look just like a fool
Refrain
Got burned by Compaq
Got burned by Dell
Got burned by Microsoft
Now I'm in Red Hat dot zero hell
Refrain
Now don't you worry
This one's ok
It won't drop under loads now
Cause if it does we'll make you pay!
Refrain
--- Will in Seattle - What are you doing to fight the War?
Way to twist his words. There's this crazy thing called CONTEXT that we should consult before bashing someone/thing.
Obviously, when he said "We did a lot of QA," he was talking about the snapshot of GCC, and not the OS as a whole.
Sure, they should have caught this bug, or better, it should have been considered at design time, but (and I'm not trying to make excuses for Red Hat here), to catch this bug, they prolly would've had to have had a 7.0 system up and running for 3 weeks straight. Maybe their test cycle is shorter than that. If their test cycle was say.. 6 weeks, then who knows what kind of bugs might pop up at the 6 1/2 week mark? You can only allow so much testing for a product before releasing it, or you'd never release anything.
As I said, yes, they should have caught this, but as we all know, no software works perfectly, and sh*t happens. At least there's a fix for it.
Right, but this application doesn't "go haywire," per se, as in "crash and burn" and scribble all over other peoples' core--it uses up a resource gradually--there is a difference.
This isn't much different from an application that runs away and fills up the disk or allocates all available memory. Should Linux allow an application to deplete a resource without giving the admin a chance to kill the offender first? Probably not, and maybe this is one of those issues that will have to be addressed in the 2.5.0 tree. At any rate, Linux is still far more stable and dependable than that other OS.
slashdot broke my sig
My problem is with the nature of this RH issue: it's a bug in a piece of software RH developed internally, and install by default without any indication or choice. I find that kind of "thinking for the customer" undesirable and unacceptable, and as I said, Microsoft-like. No doubt it's a reflection of Red Hat's post-IPO mass consumer focus; unfortunately that doesn't suit me very well.
As for RTFM, where exactly is this documented? The paper manual has shrunk significantly since RH 5.2, and I have yet to find the documentation, paper or otherwise, about the fact that this update daemon gets installed by default.
Bottom line: I'm a developer, and I don't need someone else deciding on my behalf to install daemons on my system that I don't care about. That in itself is 50% of my issue with this. The fact that this daemon had a fatal bug is the other 50%. Red Hat screwed this up both ways.