RHN Bind Update Brings Down RHEL Named
alexs writes "Red Hat's response to update bind through RHN, patching the DNS hole, made a fatal error which will revert all name servers to caching only servers. This meant that anyone running their own DNS service promptly lost all of their DNS records for which they were acting as primary or secondary name servers. Expect quite a few services provided by servers running RHEL to, errr, die until their system administrators can restore their named.conf. Instead of installing etc/named.conf to etc/named.rpmnew, Red Hat moved the current etc/named.conf to etc/named.conf.rpmsave and replaced etc/named.conf with the default caching only configuration. The fix is easy enough, but this is a schoolboy error which I am surprised Red Hat made. Unfortunately we were hit and our servers went down overnight while RHN dropped its bomb and I am frankly surprised there has not been more of an uproar about this."
So, you didn't test the update on a non-production server? Just install any old patch and let it take your network down? Who do you work for again? I have to make sure not to do business with that.
What? And isn't it an error of similar proportion to upgrade your primary DNS servers without first testing the new install?
If it was a Microsoft product, we'd all be carrying pitchforks and torches....
Why do we have to fix this ourselves. Can't RHT just issue a new update fixing it?
Also, don't they test their updates?
As long as there are slaughterhouses, there will be battlefields.
I am frankly surprised there has not been more of an uproar about this.
Schroedinger's Brexit: The UK is both in and out of the EU at the same time!
I wonder if this is included in Total Cost of Ownership. i.e. I'm really interested in estimates how big loss this mistake generated to big companies.
Here's the bug details: https://bugzilla.redhat.com/show_bug.cgi?id=453340
One of the bug comments says: "Latest caching-nameserver renamed my named.conf to named.conf.rpmsave in /var/named/chroot/etc" - so this should mean that you can still restore the lost conf file.
"I am frankly surprised there has not been more of an uproar about this"
That's because the entire Internets are now broken!
I guess the syadmins could put in an option in a configuration file somewhere on what files to "keep untouched" when doing package upgrades, no? So that the configuration file wouldn't be overwritten. I think I've seen something similar in Debian distros. Anyway when I install a new (custom) kernel in Ubuntu for example, synaptic asks me if I want to overwrite GRUB's menu.lst with the newly generated one, view the differences or keep my old one etc. Surely there's something similar in Redhat?
Half of whole point of a subscription to RHEL is to ensure that patches they put out are properly QAed. The other side is support, but I never had a chance to test that part out.
I don't need to worry about that, I run Debian
Also, I don't run my own DNS. But if I were paying someone to make sure my patches weren't idiotic, I'd be pretty pissed, whether the patch was for something I used or not.
This article is absolutely wrong.
The user has misconfigured their DNS and has installed a package called, SURPRISE, caching-nameserver along with the other bind packages.
caching-nameserver IS just that, a caching-nameserver. It SHOULD NEVER BE installed on a DNS server that is used for Primary or Secondary DNS control. The bind packages do not in any way modify named.conf, but if you want a caching nameserver and if you have installed the caching-nameserver package, then you would EXPECT that it would replace the named.conf file.
The real question is, how does crap like this get posted as a feature article on slashdot.
Guess he didn't have a lab (or decided not to use it). I've been guilty of that before... :P
What kind of environment are you in where you don't first test your patches that are going out to live production machines? Regardless of the fact that it is linux and not windows, you should always test your patches before you roll them production.
YOU'RE WINNER !
Another lame blog
This update was released via RHN more than two weeks ago.
A few months prior to the release of RHEL 5.2, they released a kernel update (2.6.18-53.1.6.el5) in which they had added a patch for an issue that could make a system oops upon when files with names of a certain character were present on NFS shares. However, this patch also contained a bug which broke NFS lookup caching and subsequently crippled NFS performance to the point of NFS being completely unusable when working with multiple smaller files. They released a patch for it, but it would only apply cleanly to their testing kernel (which would later become the kernel shipped with 5.2) and they refused to backport it to their then-stable kernel. Shortly after, the vmsplice flaw was found forcing people to update and bring this bug upon them. For us it wasn't that big a problem since we're using CentOS and don't have anything requiring us to use standard RHEL packages (so we backported the patch and built our own kernel package), but a large amount of corporate RHEL users are required to use only standard RHEL system packages because of service contracts with hardware vendors and hence they could do little to remedy this bug. As we were among the first to report this and post about it on mailing lists, we received a lot of communication from corporate RHEL users/sysadmins asking us for help on this, further proving that this was a major issue that should have been addressed right away and not post-poned to the next major release.
...check for rpm mouse droppings by running find.
RH may have made a small coding mistake - you made an even bigger one.
http://cafepress.com/spankymm - for the Masturbating Monkey in you!
Did the OP have the package caching-nameserver installed? If so, that packages whole point is to change the bind configuration into doing just caching.
Red Hat makes this mistake a LOT. It makes the update process very unreliable. SuSE isn't as bad but they still have problems if you customize a piece of software's configuration in an unexpected way.
Debian is king here. The incremental patches almost never break a configuration and the major release upgrades tend to work; they often change package names if the new "version" has a major incompatible change in the configuration.
Moderating "-1, Disagree" is simple censorship. Have the guts to post your opinion.
If you don't check out how neat the RHN satellite server, or the new spacewalk server is, you're really missing out. It is really nice in the enterprise environment.
YOU'RE WINNER !
Another lame blog
bind-9.3.4-6.0.2.P1.el5_2
[root@struct etc]# grep zone named.conf | wc -l
49
Freshly updated and restarted the service. Still have all my zones. Sounds like someone didn't do too well on the RHCE?
Yeah, it's a silly mistake.
But you should be testing things like this first, and whenever you upgrade you should really be looking at/for all .rpmsave or equivalent files first to make sure nothing has changed in the meantime. Otherwise, you're just removing your config and replacing it with the default whatever happens. You should also be checking .rpmnew (or equivalent) each time to check that it hasn't changed in terms of syntax, defaults etc. (which, let's be honest, is quite likely for such an important update - especially given that we hardly know what the exact problem is yet). I wouldn't go so far as to suggest intimate analysis of packages while they are still packed unless the systems you are running are quite critical to the operation of a business.
Part human-error on RH's part (it happens). Part incompetence in not testing the updates yourself first. Chances are that if I were affected by this, I would catch it as part of "right, what did that package change?", or notice as part of usual testing later, and then just move the file. I probably wouldn't even bother to send RH a note.
If you have a DNS server, that suggests that there are reliant computers. As courtesy to all those reliant computers you HAVE to test changes and check carefully what they are doing first. If you were "stung" by it, it suggests you hit this problem on ALL your DNS servers and/or that you only have one DNS server anyway. To deploy packages like this on such a setup is just asking for trouble.
Don't forget to check your named.conf on RHEL 5.x (and CentOS 5.x).
Make sure that any lines like
query-source port 53;
query-source-v6 port 53;
are commented out or deleted so that forwarded DNS queries come from random ports.
Restart BIND if necessary.
Have you considered using a configuration management tool such as Bcfg2 or cfengine to make sure your own config files are restored after package updates are made? You can never really trust those package maintainers...
Posted from the wireless couch.
On most (all?) other distros it works perfectly. I had Debian for ages in production (supporting piles of services) with apt-get update/upgrade running regularly. SuSE and Gentoo also do good job keeping you informed about changes in updates and if post-update human interaction is needed.
The crucial difference here is mindset of RH. It didn't changed the damm yota in the decade. The very same problem why I threw away RH6/7 in past from production, the very same stupidity of RH, is still there.
RH is only distro I have ever tried - and I tried many of them - would silently without any warning or prompt replace your config files with shipped version. It took them ages to learn that files can be renamed - yet it didn't went thru completely it seems.
This is not a single mistake. This is happening now for more than a decade now: RH during maintenance can and does override your configuration. The RH folks simply have no trivial respect to their users...
[/rants]
All hope abandon ye who enter here.
I must say that I am very suprised that this patch acted one way in the posters test environment and another when it was installed on their production machine... That's very odd.
What, he didn't test it before placing it in production? Never mind, move along - nothing to see here.
If the poster made an error (as suggested by a previous post), or if he installed a patch without testing it, bad on the original poster - but if the patch truely was bad (a possibility), then bad on RHN for letting something bad out of QA and into production. But RHN's possible mistake doesn't absolve the system admin for not testing the patch before using it.
The only way this isn't the original poster's error is if the patch worked different in production than in test, but no one is claiming that AFAIK.
No matter what you pay for support to RHN, you are ultimately responsible for your systems, not RHN or any other vendor...
Ken
I am sure that many people do not realise that going through a NAT device usually means that predictable port numbers will be allocated.
Of course until we get details of the hole and fix we cannot be 100% sure but it is very likely that exposing predictable port numbers (which the fix randomised) reintroduces the hole.
If DNS software vendors had a year's notice then why didn't the NAT firewall vendors. They could have introduced a patch at the same time.
I wish I had mod points with which to mod you up. This is NOT a bug, and a few RHEL test machines I have here updated just fine, keeping their zone files as expected.
Recent Debian's OpenSSL bug was orders of magnitude worse...
I just pulled up the SRPM and looked, and bind-chroot has: .rpmsave file
%ghost %config(noreplace) %prefix/etc/named.conf
%ghost %config(noreplace) %prefix/etc/named.caching-nameserver.conf
%ghost %config(noreplace) %prefix/etc/rndc.key
It should not replace that file with an
I use netbsd but a current openbsd or freebsd would be fine I am sure.
http://michaelsmith.id.au
...the file was backed up and not deleted.
TANSTAAFL GIGO Acronyms to live by!
the thing youre forgetting is that microsoft REGULARLY does that, and even with irrelevant minor updates. thats why people are too worked up because of microsoft. they will gonna let this red hat incident slip by, because red hat doesnt have a track record of messing it up.
Read radical news here
Because I really have the time or want to work with a hand-installed package system, and build all the packages myself.
The idea is that you TRUST the vendor. If you don't trust them, why the hell would you use their software at all?
For large sets, this will be our guide even unto death, for the LORD will work for each type of data it is applied to...
Because 98% of all users hate having to edit a configuration file just to get the maximum resolution out of their video card.
RH is only distro I have ever tried - and I tried many of them - would silently without any warning or prompt replace your config files with shipped version.
First, it doesn't do this without any warning...the output of rpm (which does the actual install) is forward to yum, or rhn, or whatever is running the "figure out everything I need and get it" process, and that is displayed to you when you are applying the patch. It clearly states in that output what happened with the file.
Second, for some updates (particularly security updates like this one), it is appropriate to save the old config file and load a default one, especially if that default one helps provide more security. Then, the admin can figure out what parts of the new default should be applied to their config, merge everything together, and restart the service.
These are the kinds of procedures that good admins do when they make changes to the system in any way.
Then suse is probably not for you. You should try something like Slackware (slamd64 is a good amd64 port) if you need that kind of flexibility/control.
Hell, if you tried hard enough, I'm sure you could port YaST to Slackware!
For large sets, this will be our guide even unto death, for the LORD will work for each type of data it is applied to...
"Unfortunately we were hit and our servers went down overnight while RHN dropped its bomb and I am frankly surprised there has not been more of an uproar about this"
.. :)
You mean you installed a patch that you didn't know was faulty and didn't have a rollback option in place, shame on *you* Mr. Sysadmin
davecb5620@gmail.com
Does the red hat version of apt-get (yum? I've used debian exclusively for a long time, so I forget what command it was) not prompt you when it wants to overwrite a config file? On any debian (or debian derived) machine I've used, apt-get always asks what you want to do if your config file is different than the package's.
If yum(?) blows away configs without prompting, that's pretty bad.
-Bucky
"lot of people work in small shops that can't afford multiple redundant servers .. What are those admins supposed to do?"
Keep two harddrives in the same machine and keep a clone of the DNS server on the second HD, if an upgrade borks, just swap the drives.
davecb5620@gmail.com
If you don't check out how neat the RHN satellite server, or the new spacewalk server is, you're really missing out. It is really nice in the enterprise environment.
Here are the problems with the RHN satellite server:
1) It only runs on RHEL4. RHEL5 is not supported.
2) It costs a lot of money ($13,500 annually)
Reference: http://www.redhat.com/red_hat_network/
As for the Spacewalk server:
1) Requires Oracle Database (9i or 10g)
2) Only supports Fedora and CentOS. Cannot manage RHEL releases.
3) Requires RHEL5 as the base machine for the initial installation/web server platform.
Reference: http://www.redhat.com/spacewalk/faq.html#compare
This is news? Redhat (like every OS vendor I've ever dealt with) have been pushing out updates with broken assumptions for years.
In fact, this isn't even the first time they've done something similar when updating bind: /etc/rc*.d/S*named and /etc/rc*.d/K*named and then shut named off.
back in 2004 they released RHEL 3 update 4 and many people had precisely the same experience. Additionally, when applied, Update 4 removed the
As a quick glance at redhat's bugzilla shows, the first problem (the same one you experienced in this release) wasn't a schoolboy mistake on the packagers part, or a bug. It was the result of a poorly understood choice on the part of the person who originally provisioned the machine.
Rather than installing just the original bind-9.2.4, the people who had their named.conf overwritten had installed bind plus a package called caching-nameserver. It's that package that, when updated, backed up and overwrote their bind config. The "caching-nameserver" package should only be installed if you want to run a caching nameserver, because the caching-nameserver package isn't an application at all - it's simply a named.conf file.
The real bug (back in 2004) wasn't actually in Update 4's bind package. As it turns out, the package it replaced incorrectly contained a `chkconfig --del named` in its uninstall script.
Anyone without proper alerting and a good QA process found that one out the hard way. I had customers who'd gotten so blasè about performing nighttime maintenances without proper reversion testing that they scheduled nightly cronjobs that ran up2date at midnight and rebooted the production machine, Naturally, they woke up in the morning to find they'd just suffered 8 hours of downtime.
Lesson? Don't trust the vendor's QC work, don't install unnecessary packages, and make sure to QC your own work! Ask any experienced Windows admin about unintended consequences from "trusted" vendor patches...
GUILTY.
Seems the person that prepared the patch is a new hire at Red Hat.
Beware Latest 10.3.x security update - it replaces /etc/named.conf:
http://discussions.apple.com/message.jspa?messageID=5876624
~hylas
Crap like this is why I lost faith in Microsoft and quit running Windows years ago. Thankfully my RHEL box isn't affected by this sort of... oh... wait... really? Shit.
How is it that one careless match can start a forest fire, but it takes a whole box to start a campfire?
Check here:
http://lateral.netmanagers.com.ar/weblog/2008/07/16.html#BB701
In at least one very common config, named.conf is a symlink, so copying it doesn'tavoid it being overwritten.
The named update script "copies" symlinks by making another symlink, not by copying the underlying file.
This sounds like how RPM's behaved as long as I can remember. It looks at three versions of a config file: #1 the one from the old package, #2 the one currently on disk and #3 the one in the new package. If the config file hasn't been customized (1 and 2 are identical), it moves the old file to .rpmold (if 1 and 3 differ) and puts #3 into place. If the config file has been customized, it checks whether 1 and 3 differ. If they haven't then nothing's chanced, the customized config file's still valid and it drops #3 in with the .rpmnew extension. But if 1 and 3 differ, then something in the config file may have changed and the customized config file may no longer be valid. But it's got customizations in it that the admin may need to refer to. So it outputs a warning message about what it's doing, moves the customized config file to .rpmsave and installs #3, and the admin's expected to have seen the warning and to merge their customizations into the new config file. You do watch for warnings and errors during the update, right?
In this case RPM is right, old named.conf files aren't valid. If they're based off RH's old stock config files, they have the source port locked and that disables much of the security fix. So the admins do have to check and modify their customized files before the system's finally ready (or at least RPM has to assume they do, since it can't know exactly what their changes were). That's exacerbated by probably having caching-nameserver installed, but I think a stock BIND install has a similar named.conf until you add your own zones to it.
I'd chalk this one up to admins who a) don't understand an inherent limitation of package-management systems (namely, it doesn't know why you changed something, only that you changed it), b) didn't watch the update process for errors, and c) didn't check the systems for functionality after the update.
...that neither 'bind' nor 'named' should be capitalized. Then again, they're not very technical people.
Advice: on VPS providers
I've been using Thiobor HyperWRT for a while now but that has not been maintained for a while now. I use dnsmasq on the WRT54G (an old one more like today's GL version) and I see that it has been patched to the newer stuff. It looks like I will have to move to OpenWRT, does anyone know which versions are new enough so that they have this fix? I took a look but could not find a Chnage Log and the versions seem older. Or alternatively is HyperWRT in fact still being maintained somewhere and do you know the new link?
Thanks in advance.
This is software design issue, too. A good software package using config files would have the ability to parse and understand separate files for a default configuration and a locally customized configuration. When such software is distributed, whether in source form that you compile and install, or a binary package you simply install, it will install a default config file that never needs to be updated by the admin/user. To customize such software, a local config file will be written and placed in a different location that the software looks for. The local config will override the default config, when the local is present. The installation of the software will never touch the local config file.
now we need to go OSS in diesel cars
Do you really trust the updater to not install a backdoor somewhere in your system while you aren't watching?
Um, yes?
If we cannot trust Red Hat not to install a backdoor in our system, we'd hardly be able to use Red Hat, would we?
If corporations are people, aren't stockholders guilty of slavery?
Thanks ./, ive known about this for TWO WEEKS.
And no one died.
So there.
NO SIG
Would your response have been if this was done by Microsoft?
Just asking.
Don't entrust the function like DNS to a single vendor. With some services it is hard, as authors support a limited range of OSes/hardware or charge too high a price for each installation to make redundancy affordable.
But not DNS. Free solutions abound, and the commercial ones are quite cheap too. They are available for all imaginable "server-grade" OS/hardware combination. If you use more than one servers for DNS in your enterprise, and both of them use the same platform, you aren't doing your job.
Mind you, I don't blame the victims here — Red Hat screwed up royally, and that's that. Just advising on how to avoid being hit by such (inevitable) mistakes — from any vendor — in the future.
In Soviet Washington the swamp drains you.
The user has misconfigured their DNS and has installed a package called, SURPRISE, caching-nameserver along with the other bind packages.
If the RPM utility saw the configuration file was modified (from a mis-matched checksum against its database), it should not have touched the file.
What would have happened if the user was BIND in a caching-only mode, but he modified the file to have a 'listen-on' directive for security purposes? He would have been using the package "in the correct" way, but it still could have borked his configuration.
While he probably should have tested, not having your update system modify your configuration behind your back a pretty good prerequisite to have in an updating system.
Isn't this one of the advantages of having configuration stored in text files? Being able to see changes in a granular fashion? Imagine if it was a binary blob that was updated. Copying back a file (or restoring it from backup) is fairly easy, but if you had some obscure bag of bits it might be harder.
I agree with the parent.
RH quality has been slipping for some time now. Heck, I don't even know what RH brings to the table as far as distributions are concerned.
Their GUI system-config-* programs are so poorly written that a lot of them don't even work. And those that do can't repaint themselves properly on the screen. Imagine GUI programs that take up to 30 seconds to (re)paint themselves after being covered/exposed. And that's on a beefy dual-core PC!
I don't know so much about their enterprise stuff (thankfully I only have to deal with a single RH server) but Fedora certainly sucks ass. I don't think RH does *any* testing on it - that's for suckers ^H^H^H er...the community to do. Every time there is update, I say to myself "Okay...what did they break this time?"
nameserver RHN package, caching-nameserver.arch to serve authoritative zones. it's a caching nameserver, it's not supposed to serve authoritative zones! if you are using the regular nameserver package, bind.arch, it breaks nothing. it keeps the old config and copies the new config to .rpmnew.
RHEL - 5.2 - caching-nameserver-9.3.4-6.P1.el5.i386.rpm
RHEL - 5.1 - caching-nameserver-9.3.3-10.el5.i386.rpm
RHEL - 4.6 - caching-nameserver-7.3-3.noarch.rpm
RHEL - 3.9 - caching-nameserver-7.3-3_EL3.noarch.rpm
Trying to become famous by taking photos. Visit my homepage please.
On Debian and its derivatives the upgrade process is suspended when a config file difference is detected--the admin is then given an interactive prompt where they can inspect the difference, accept or reject the new version, drop to a console, or cancel the upgrade entirely. Since these are the kind of procedures that good admins do... why not make it easy and quick for them to do it, when they need to do it? The RH system is a poor implementation from a usability perspective.
The DNS server might be one of ten, fifty, hundreds, maybe more different servers that an admin has to care about. The person deploying a machine might not have ANY clue whatsoever about the exact package configuration on a given machine. It might even be a repurposed machine, or one providing general network service to which DNS is added later on. It might have even BEEN a caching DNS server previously!
The whole idea of a package that merely overwrites another package's primary configuration file is absolutely flawed. The two packages should be mutually exclusive.
I'm sick and tired of the Linux crowd berating other systems such as Solaris and Windows because updates often require reboots, or because they sometimes break things (Solaris/sendmail.cf ALWAYS gets brought up), or because single user mode is recommended. You'd get the impression that Linux uses some magical pixie dust, and never needs to be restarted after any update other than a new kernel, and anything other than the kernel can be hot swapped out with a million users logged in using it. Look at the responses here. Something that is obviously poorly engineered by the vendor gets brushed under the rug and labeled a user error. Every time an update screws up some running processes, it's "oh you should have had everyone log off first". When a configuration file gets overwritten it's "you should have tested it outside of production first". Does anyone realize that when you start taking all of these precautions, Linux starts to be just as much a pain in the ass as any other OS? Linux is easy when admins are lazy, and not doing their job.
I fucking HATE "Linux people" now. People pushing it where it's not ready to go, and without understanding why other systems are different first. You're all just as bad as the same ugly windows crowd that was always repressing you a few years back. Now Linux is more accepted, but the fanboys just switched sides, that's all. A new generation of young idiots. Given OS X, Windows, and Solaris, Linux is totally fucking useless. Now, I appreciate and support free software, but let's not get the ideological, philosophical, and technical aspects of Linux mixed up here. I do not want open or free software to go away, but I do want the supporters to try to understand what others systems do differently and not assume the Linux way is best.
Let me calm down... please appreciate open source software because of the openness, not some imagined technical superiority. Linux is far, far, far away from that.
These arguments come up all the time. So it is with chroot.
The Linux kernel lost 'securelevel'. ("A hacker can turn it off by mucking around with /dev/mem anyways, or use $kernel_bug_of_the_day to flip the bit")
Python lost 'restricted' mode. (There are some ways to get code out of the restricted jail..)
PHP6 is losing features like safe_mode, open_basedir (Custom extensions may be able to open files despite the open_basedir restriction)
I wouldn't be surprised if chroot itself gets removed eventually, and ext3 'immutable' bit, or gets a fat disclaimer not to use it. It probably only stays because it is used for some build environments.
Why? Because these security measures aren't perfect They don't guarantee 100% security against a skilled attacker. They don't satisfy everyone.
Apparently for some folks, security measures aren't acceptable unless they're effective in 100% of situations and against 100% of the possible attackers.
Even if the measures had some very practical uses... the very danger that 'people might think this is a security measure', is worth removing useful features that make life harder for crackers.
Remember, this is the company that agreed that rpm has a bug that can cause corruption, and yet closed the case out with a "WILL NOT FIX".
Respect their users? Bwahahahahaha. They want to be the Microsoft of linux, and don't you forget that.
Who is the last major distro to join any standardization efforts? Bah.
Oh my, another person who read Trusting Trust, and writes his own compilers.
Redhat 6/7? How long ago was that? See the following release dates I think you will find Redhat has improved since 1999/2000.
Professional Redhat releases RHEL2.1 (26 Mar 02), RH3 (22 Oct 03) through RHEL3.U9, RH4 (15 Feb 05) through RHEL4.U6 and now RHEL5 (14 Mar 07) through RHEL5.U2 although the above link does provide more information.
If people want a Linux distribution that has full software support they either go to Redhat or Novel (SuSE). There are firms that will provide support but the Redhat followed by Novel have the lion's share of the market.
There ain't no such thing as proprietary standards only proprietary formats. Standards are by definition open.
Why do you run BIND in a chroot 'jail?' It doesn't do squat for security.
If bind is run as a non-root user, chroot has some value. It's just not the ultimate weapon some think it is.
I would like to point out that this is impossible on gentoo, it doesn't update config files automatically, replacing them or otherwise. It includes a nice interface that you can choose the update, the original, diff them, select bits from both files, etc... But it *never* overwrites anything in /etc/ without asking you first.
djbdns ftw!
Comment removed based on user account deletion