Recovering From apt-get Failures?

← Back to Stories (view on slashdot.org)

Recovering From apt-get Failures?

Posted by Cliff on Sunday March 4, 2001 @01:54AM from the blasted-package-management dept.

Ross Vandegrift asks: "Once upon a time long ago, apt-get totally trashed the company webserver while we were trying to upgrade it. Since then I have been very suspicious of apt-get. Recently, a friend from school talked me into trying it again. I slowly eeked back into it and was keeping our machines up to date, and automatically applying security patches. It was cool, I was starting to trust it again. Well, today that changes..." I'm sure everyone who has ever used RPM or apt can understand the frustration one goes thru when running into this problem. For those who have, what did you do to get your system functioning again?

"I ran 'apt-get update; apt-get upgrade' last night to upgrade one of our machines. Nothing seemed out of the ordinary, I recieved no error messages indicating something terrible happened. But now when I try to use 'su', it returns a libpam error that I've been unable to find reference to, either on the web or in the Debian mailing list archive.

So this poses the question: what do I do if apt-get fails and screws up the system? I've tried reinstalling/reconfiguring the affected packages to no end. If we used my distro of choice (Slackware), I'd have an intimate understanding of my system and would know right where to look when I get an error. But with apt, most of the packages on the machine are black-boxes; I don't know much about them outside their package name and function. What would cause apt-get on a debian-stable machine to actually cause problems? And what can I do to back out of its changes?"

19 comments

Min score:

Reason:

Sort:

I have had similar problem. by Ryan+Koppenhaver · 2001-03-03 21:39 · Score: 1

Thus, I use RPM exclusively.

You can argue and quibble about functionality and neither installer will win decisively. However, RPM doesn't ruin your system. Ever. I don't understand why so many users tolerate an installer that's the equivelant of playing Russian Roulette.

It's insane, if you ask me.

--

So I sez to him, I ain't givin' you no damn three-fity.
1. Re:I have had similar problem. by ioctl · 2001-03-03 22:43 · Score: 1
  
  Well, usually apt is rock-solid, and provides (quite nice) dependancy resolution. Only rarely does it screw a system. The Debian developers do a very fine job keeping everything running smoothly, but they are human, and do make mistakes periodically.
  
  RPM, OTOH, makes you have to go find all the dependancies (btw, I'm referring to RPM as a package type, not as an upgrade system, as I have little experience with those). It is also very easy for a newbie or lesser guru to screw their system because they installed the wrong package. Another thing is that RPM doesn't inherently guaranty that the package is legitimate, whereas .deb does (in the form of pgp signatures).
  
  So, anyways, before I ramble too much more, RPM only wins for people who could build their GNU/Linux install from scratch to begin with. Apt's .deb wins for everyone else.
2. Re:I have had similar problem. by Soft · 2001-03-03 22:56 · Score: 2
  
  Thus, I use RPM exclusively.
  
  So? Say there is a security advisory, you should upgrade package ftpd. You get the updated RPM from RedHat (or Mandrake or whoever) and try to install it. RPM complains because ftpd depends on new versions of such-and-such libraries. You fetch the updated RPMs. You install all of them.
  Guess what, that's exactly what Apt does, except that it uses .debs instead of RPMs (not counting recent developments). You run the very same risk that a post-install script in one of the packages messes something up, except that you had to get all the dependencies yourself...
Stop playing with unstable. by Anonymous Coward · 2001-03-03 22:12 · Score: 1

You're running unstable. Running unstable is a really stupid thing to do if a problem this simple and tiny resents any kind of a barrier to you.

I've yet to hear of anything like this happening to anyone running the stable, i.e. Joe Average release.

The more helpful response would be - subscribe to debian-user, as any problem in unstable is discussed there straight off, or join #debian on irc.debian.org and ask "apt" about the broken package. Chances are it'll have the answer if it isn't in the channel topic.
Backup the packages? by Soft · 2001-03-03 22:45 · Score: 1

I've tried reinstalling/reconfiguring the affected packages to no end.

Have you tried getting older packages and installing them over the newer ones with dpkg? Did you look at /usr/doc/<package>/changelog.* and README*? Did you try extracting the pre- and post-install scripts from the .debs (which are really ar archives containing data.tar.gz and control.tar.gz, the latter containing all the packaging stuff)?
As for myself, I don't usually use Apt directly, but the Apt method in Dselect. People say it's counter-intuitive, which it is, and difficult to use, which I don't think it is - once you get the grip of <space>, <enter>, +, -, _, =, R, Q it gets quite easy. Before upgrading a system I look at which packages are going to change; if they are critical, I try to choose a time where I know I am physically close to the machine and can undo the upgrade one way or the other.
If we used my distro of choice (Slackware), I'd have an intimate understanding of my system and would know right where to look when I get an error. But with apt, most of the packages on the machine are black-boxes; I don't know much about them outside their package name and function.

I disagree. The packaging has a black-box look (and even then you can easily open .debs with standard tools on any UNIX system, unlike RPMs) but what matters is the files they contain. PAM configuration is still in /etc/pam.*. Use dpkg -L to see what files are there.
He doesn't play with unstable. by Soft · 2001-03-03 23:02 · Score: 1

Reread the original article.
Furthermore, everybody messes up sometimes. Debian developers may well have broken something despite all the precautions they take before uploading a new package to Debian-stable. And there is something strange currently going on with FreeBSD 4.2-STABLE according to someone on the mailing-list.
And no, I myself have seldom if ever had problems with Apt on Debian-stable.
1. Re:He doesn't play with unstable. by mopsuestia · 2001-03-04 02:26 · Score: 4
  
  That's what he said, BUT the problem he is describing was introduced within the last couple of days in unstable. There were at least 3 bug reports placed yesterday concerning this bug, (see here, here, and here) and I imagine a fixed package will be available RSN. (Personally, I just reverted to the previous revision of the packages that were sitting in my apt cache.) I have had no problems with PAM in stable.
  As others have said running unstable is only for those who are ready for breakage and know enough to fix it. Use stable if the above doesn't apply or testing is you absolutely much have newr packages.
  Also, bear in mind that these are broken packages, so the blame doesn't really fall on the package manager. The packages may have been just as broken if they were distributed as tarballs. It is not the .deb's that are broken, per se, it is the files contained in those .deb's.
  Finally, no one is stopping anyone from installing from tarballs on a Debian system (or any other linux distribution). You don't get the benefits of the package manager, but if you don't trust apt/dpkg/rpm anyway, I don't doubt you will think that is much of a loss.
The problem is between keyboard and chair by Jaldhar · 2001-03-03 23:14 · Score: 3

I'm sorry if some of the following sounds harsh but you are going about things in dangerously foolhardy ways.
Ross Vandegrift asks: "Once upon a time long ago, apt-get totally trashed the company webserver while we were trying to upgrade it. Since then I have been very suspicious of apt-get. Recently, a friend from school talked me into trying it again. I slowly eeked back into it and was keeping our machines up to date, and automatically applying security patches. It was cool, I was starting to trust it again. Well, today that changes..."

So let me get this straight? You already have one bad experience of an apt-get and yet you are doing the same thing all over again? No analysis or review of what went wrong the first time and what you should be doing differently? Is it any wonder the same problem crops up again?

I'm sure everyone who has ever used RPM or apt can understand the frustration one goes thru when running into this problem. For those who have, what did you do to get your system functioning again?

"I ran 'apt-get update; apt-get upgrade' last night to upgrade one of our machines. Nothing seemed out of the ordinary, I recieved no error messages indicating something terrible happened. But now when I try to use 'su', it returns a libpam error that I've been unable to find reference to, either on the web or in the Debian mailing list archive.

I haven't had this problem but it would seem looking at /etc/pam.d/su to see if anything has changed might be a good start. To answer the larger question, on any kind of production machine, do apt-get -s upgrade before actually upgrading. The -s switch tells you exactly which packages are going to be installed/removed/whatever. For each affected package look at the changelog to see what's new. Every time a package is uploaded, the changes are sent to the debian-changes (stable) or debian-devel-changes (unstable) mailing lists. Or you can look at the packages' page on www.debian.org

This won't help you if the packager has simply made a mistake (which can happen.) To cover that eventuality, keep the last known good version of each package handy somewhere. If you don't do apt-get clean, downloaded packages will remain in /var/cache/apt/archives. Or you can keep a central repository somewhere.

So this poses the question: what do I do if apt-get fails and screws up the system? I've tried reinstalling/reconfiguring the affected packages to no end. If we used my distro of choice (Slackware), I'd have an intimate understanding of my system and would know right where to look when I get an error. But with apt, most of the packages on the machine are black-boxes;

???? How are .debs any more of a black box than slackware packages? You can get the source of any one of them with apt-get source The Debian Policy Manual explains exactly how the system is supposed to function and where things are.

I don't know much about them outside their package name and function.

Then find out! I'll admit Debian is underdocumented but certainly all the information in this message is well-known. Why are you adminstering Debian systems if you don't understand Debian? apt-get is magic in many ways but it will never be a replacement for human competence and common sense.
RPM and PGP by winterstorm · 2001-03-03 23:56 · Score: 1

Another thing is that RPM doesn't inherently guaranty that the package is legitimate, whereas .deb does (in the form of pgp signatures).

You are misinformed. RPM does support PGP signatures. This is not a new feature of RPM. To verify my claim, please see the man page for "rpm(8)". Perhaps your complaint is that Redhat's automatic update tool, or some third party automatic update tool doesn't use the RPM PGP signature?
I would like to comment that only a small amount of resarch is required to compare current versions of the APT and RPM package systems and discover that they are remarkably similar and each has a few features that the other could benefit from.

Comparying apt-get to RPM is inappropriate. Comparing Redhat's, or a third party's, RPM-based autoupdate tool to apt-get would be more appropriate.
run the upgrade on a test box first by gempabumi · 2001-03-03 23:59 · Score: 3

dude, i have had far worse things happen when working with unstable. granted, unstable is a risk, but there are some nice versions in there. i usually install my base system from stable and my daemons from unstable.

anyway, if you are tempted to use unstable, do your upgrades on a test box before trying it on your production server. that's a fast and easy way to try it out with a box you are sitting next to - the alternative is to screw a box which may be thousands of miles away (depending on where you colo).

another benefit is that your test/development box always has the same state as your production server. as it should.
1. Re:run the upgrade on a test box first by martyb · 2001-03-04 00:53 · Score: 2
  
  do your upgrades on a test box before trying it on your production server
  Yes! If you're going to do ANYTHING that is potentially destructive, practice someplace else, first.
  what did you do to get your system functioning again?
  Restore from backups. People make mistakes, programs get hosed, hardware dies -- prepare for the eventuallity. If I've got programs or data that is important or would take a good chunk of time to re-create, a safe copy has saved my butt many a time. Ideally, you should have daily backups of your system, and with the struggles you are going through now as well as your past experience, a good business case could be made to justify the expense. Also, you could toss in the idea of disaster recovery and off-site backups. One hurricane or earthquake could force your whole company out of business if everything resides in one [computer/site] basket.
  Even if the powers that be are so cheap that a dedicated backup system is out of the question, disks are getting pretty cheap these days. I have no idea what platform you are working on or the size of the data files or database you are using. But, for eample, a $100 investment in an extra IDE drive [or even a separate partition on your existing drives] where you could store a gzipped tar of the directory tree[s] in question would come in really handy right about now, and would only take a few minutes' preparation before-hand.
Correction. by winterstorm · 2001-03-04 00:02 · Score: 1

When I said, "to compare current versions of the APT and RPM package systems", I actually meant to DEB and not APT. I was refering to the package system, not the update tool. My apolgoies for any confusion.
Backup ??? by Etyenne · 2001-03-04 01:05 · Score: 1

Is'nt this kind of situation why you should regulary backup ...? Granted, reinstalling from tape to recover from a config problem is frustating, but you should have this option. You *do* backup, right ???

--
:wq
Be conservative by blakestah · 2001-03-04 02:11 · Score: 2

This is kinda silly.

ONLY apply security updates as fully automated. ONLY point /etc/apt/sources.list to the security site, stable. If you are going to automate updating of the machine in cron, be very very conservative. And it is also a good idea to capture the standard output and standard error, and email it to the sys admin. That way you should at least be alerted in a timely manner when the machine is hosed.

Then, manually, upgrade the machine with apt-get -u dist-upgrade while /etc/apt/sources.list points to the main debian server, and debug things immediately.

Apt-get is a great tool. So is a gun. Both can wreak havoc when used inappropriately. But remember, it is not the gun's fault. Rather, it is the jerk on the trigger.
Couple things by JediTrainer · 2001-03-04 02:43 · Score: 2

I run a production (24/7) server at work as well, which runs an app which we have been offering to our customers and are constantly improving (adding features, maintenance, customization etc).

Naturally, we have our down-time late at night or on weekends, when it's not as likely that there's anyone using it. Firstly, keep your data on a separate drive from your applications and system. Now, we make daily backups of all of our data.

As for your system, every time you do an upgrade or new install, make a disaster-recovery image of the apps/system drive or partition. Get a good piece of software (commercial if you must - even Ghost or DriveImage will do) to image the whole thing into a safe location. This can happen either to a network drive on another server, a tape, or another drive on the same system. In our case, we have a server dedicated to storing various images for our machines.

Just 3 weeks ago we had an upgrade destroy our system (render it unbootable). This happened at 2am while upgrading RAID device drivers. It took about 30 minutes to restore the system from our disaster recovery image, and then of course we analyzed what went wrong and completed the upgrade over the next hour, then tested, and created a new disaster recovery image with the updated configuration.

If you run a production system, you can't afford to not have a disaster recovery image that you could restore in a matter of minutes - naturally you can't afford to re-install everything. Also, keeping your data on another drive keeps it safe from whatever you're doing to the apps.

Oh yes - and DO test your disaster recovery procedures before you need it.

--

You can accomplish anything you set your mind to. The impossible just takes a little longer.
Look up the definition of "unstable" by NetJunkie · 2001-03-04 03:00 · Score: 2

If you run the unstable distribution, be prepared for problems every now and then.

Now..since you've hit a problem go to #debian on irc.openprojects.org. Every time I've hit a problem with an update to an unstable system the problem has been reported and usually a fix is in the channel topic.

And it should be repeated... If you trashed a web server doing an update, why were you updating a production system to new packages without testing them on another system? The ONLY source in my sources.list on my production systems are for security updates, and I apply those one at a time and test them.
Simple backup procedure. by MartyJG · 2001-03-04 05:28 · Score: 2

This is a bit basic, but I was messing about with stuff from the Red Hat Rawhide folder and screwed up my install - back into 'doze and restored the root partition from a Ghost image made a few weeks before (yes it was ext2). I stored all the images on the Samba server in the next room, so it wasn't a completely Windows solution ;-)

Ghost came with my CD re-writer, but it's cheap enough on it's own and worthwhile if you're on a dual-boot system.

--
insignificant sig
I don't trust *nix auto-updaters by blackwizard · 2001-03-04 18:29 · Score: 3

I've gotten burned twice lately with *nix auto-updaters.

1. My parents have a ppc based system that I installed Yellow Dog Linux on, and installed some network cards in, so it could function as their internet NAT router. I ran the "yup" program the other day, and it updated a lot of system utilities and was very helpful. However, it also upgraded to a different version of the linux kernel at the same time. Now, this wouldn't have been so bad -- but -- ** IT DELETED THE OLD KERNEL!! ** Not good, sir, not good. So along comes my dad to reboot the thing one day when he was moving it around, and when it came back up, since the boot loader hadn't been reconfigured, the old kernel tried to boot. Whoops! (Actually, the kernel itself wasn't deleted, but several other importants pieces close to the kernel, i.e some modules that were dependencies to other modules, etc...) It sure was fun talking my dad through hours of trying to compile modules for the other kernel and getting it installed into the boot loader. (The old kernel was toast -- the "yup" program completely overwrote the old kernel source code)

2. My system got hosed last night when I used "autoslack" to blindly upgrade my system. You'd think I would have learned my lesson -- well -- I partially did, I backed up all my config files first. (Good thing too.) Anyway, I din't figure out exactly what autoslack hosed on my ssytem -- it either upgraded a lot of critical system components to new ones that required a version of glibc that I didn't already have, or it upgraded my glibc, somehow leaving some critical system components confused about what the hell was going on. (things like "ls" and "rm" were completely broken) I ended up doing something like "cd / ; mkdir .hosed ; mv * .hosed ; mv .hosed hosed" and reinstalling a newer version of slack on the system. Ugh.
Comment removed by account_deleted · 2001-03-06 08:16 · Score: 1

Comment removed based on user account deletion