Putting Linux Reliability to the Test

← Back to Stories (view on slashdot.org)

Putting Linux Reliability to the Test

Posted by michael on Friday December 26, 2003 @05:02PM from the flying-colors dept.

Frank writes "This paper documents the test results and analysis of the Linux kernel and other core OS components, including everything from libraries and device drivers to file systems and networking, all under some fairly adverse conditions, and over lengthy durations. The IBM Linux Technology Center has just finished this comprehensive testing over a period of more than three months and shares the results of their LTP (Linux Test Project) testing."

17 of 296 comments (clear)

Min score:

Reason:

Sort:

Re:USE BAD HARDWARE! by Anonymous Coward · 2003-12-26 17:20 · Score: 1, Informative

Uhh..I have a computer built completely from individual parts that I've bought, and I haven't had any stability problems. Ever.

Keep in mind that Dell and Gateway's PC's are good because the hardware they choose to include is good. Anyone can buy the same hardware.
Re:USE BAD HARDWARE! by Anonymous Coward · 2003-12-26 17:28 · Score: 1, Informative

> I swear it's an industry conspiracy that generic
> parts struggle a boat load.

Either that, or it's part of the myth of "I can build a box that does XXXX cheaper than *big name here*"

After experiencing the reality of real world reliability, performance and support hassles, I just have to snigger at ANYONE who comes up with those claims.

Generic cuts it when I have a server in the next room at home and I'm here all the time that it needs to be working. For anything else, it's just buying extra work for myself.
Re:s/w -vs- h/w failure? by Anonymous Coward · 2003-12-26 17:33 · Score: 5, Informative

http://www.memtest86.com/ Freeware GPL bootable memory tester for PC platforms... highly recommeded for troubleshooting flaky RAM...
Re:You don't trust Microsoft to evaluate Windows.. by davidstrauss · 2003-12-26 17:45 · Score: 3, Informative

Of course we would not trust IBM to evaluate linux. That's why the used LTP for testing.
Microsoft commonly hires outside companies to perform their tests. Do you remember the evaluation of Exchange versus Notes/Domino scalability by Ziff-Davis but funded by Microsoft? People justifiably questioned those results, as the company hired (Ziff-Davis) has an interest in pleasing the hiring company (Microsoft) so they get future work.
Re:Linux 2.4.19-ull-ppc64-SMP (SLES 8 SP 1) by Anonymous Coward · 2003-12-26 18:03 · Score: 1, Informative

Do a SuSE FTP install of SuSE 9.0.

It's as close as it's going to get to a SLES release, and it's free. Suse has to many obligations to support propriatory software to make SLES free to download.

link:
http://www.suse.com/us/private/download/s use_linux /

Also you can get a AMD64 bit version from download now, too.
Re:WHAT is the failure? by be-fan · 2003-12-26 18:31 · Score: 3, Informative

Its the results from the Linux Text Project suite. 95% success rate, zero critical failures, means that 95% of the 2000 test cases completed successfully, and nothing crashed the kernel. To see what that means, just take a look at what test cases are in the LTP!

--
A deep unwavering belief is a sure sign you're missing something...
Re:Diagnosing software vs. hardware is easy. by puffing_billy69 · 2003-12-26 18:39 · Score: 2, Informative

> When you run the same test 5 times, and it gets to the exact same point before sig11ing, you have a software flaw.
Not necessarily: When uncompressing one of the XFree86 source tarballs, X430src-3.tgz, on my old k6 2-450, gzip would always die with a bad CRC. Nothing else at all seemed to go wrong with the machine, but I couldn't uncompress the file until I downed the memory clock to 66MHz, rather than 100.
I found one other person with the same motherboard having the same problem in a google search, and also heard there was a problem with that mainboard using ram at 100MHz.

--
printf("%s@yahoo.co.uk\n", uid[569754].name);
Re:Linux 2.4.19-ull-ppc64-SMP (SLES 8 SP 1) by JasonStiletto · 2003-12-26 18:53 · Score: 2, Informative

for one thing, it would be difficult to run a 3 month stress test on 2.6.0 when 2.6.0 isn't 3 months old, and isn't part of a released enterprise product. If they stress tested one of the betas and it failed, Microsoft would use it for advertising. :)
Re:Why? Here's why... by Crypto+Gnome · 2003-12-26 19:05 · Score: 5, Informative
OK, so the Reality Check in this equation amounts to:

You should not trust this evaluation at all.
- Go to the site
- download the testing tools yourself
- read the test paper
- use the test methodologies as documented
- do your best to verify their test results yourself
- go back to the site
- post your results for everyone else to see
(ie follow the good practices of basic science)

After all... On the internet , nobody knows you're a dog.

Any JimBOB can write a convinving paper, with all the right buzzwords, that sounds as if X+Y=Z, especially if that was logically a likely/expected outcome in the first place.

As a well-known TV show once said (several times and loudly) Trust No-One.

Remember people, YMMV.
--
Visit CryptoGnome in his home.
Re:Diagnosing software vs. hardware is easy. by ImpTech · 2003-12-26 19:17 · Score: 3, Informative

Bleh, thats not necessarily true at all. A good race condition in a many-threaded program can quite easily look very much like a hardware problem, in that it is difficult to reproduce reliably.
Re:Here goes by Fnkmaster · 2003-12-26 19:51 · Score: 2, Informative

You mean an "unbiased" industry analyst? The problem is that everybody needs their bills paid by somebody. And these days pretty much everybody in the computer industry has some interests tied up with either Microsoft or Linux (seeing as how most of the old Unix players are becoming Linux players as well - IBM, SGI, (sometimes) HP, Sun...).

It takes a lot of time and money to do very thorough analyses of operating systems, hardware and enterprise apps. So that money has to come from somewhere. It would be all well and good to say "hi, we're an independent research and analysis lab, we'll write unbiased reports about the state of the industry", but somebody has to fund that shit. And pretty much all that money can be traced back one way or another to some of the big companies in the business who can afford to throw it around for marketing benefits - like Microsoft or IBM.

In a perfect world, all the customers and potential customers of software would get together and each chip in a little bit of money to fund good, unbiased research. But like in the world of politics, it's easier to get a few special interest groups who have a lot at stake together than to get hundreds or thousands of parties who each have a little at stake to cooperate.
Re:Diagnosing software vs. hardware is easy. by jemfinch · 2003-12-26 21:03 · Score: 2, Informative

So when you run a test 5 times, and you get 5 results, the hardware is broken. When you run the same test 5 times, and it gets to the exact same point before sig11ing, you have a software flaw.

This isn't true. If you're running a program that uses a deterministic memory allocation algorithm (a compiler, for instance) and have a segment of bad memory, then you easily could crash at the exact same point (when a pointer in that segment is dereferenced, for instance).

I know. It's happened to me. I've even had such slightly bad memory that I could compile nearly everything I needed, but one project consistently failed. I took out a bad memory chip (actually it was simply mismatched PC100/PC133) and everything worked fine.

Jeremy

--
Looking for a Python IRC bot?
Re:USE BAD HARDWARE! by bhtooefr · 2003-12-27 00:15 · Score: 2, Informative

Try running Memtest86 on something like that. It might be the RAM, as Windows handles RAM differently from Linux, and can hit bad parts sooner than Linux (and vice versa).
My experience by MadChicken · 2003-12-27 00:32 · Score: 2, Informative

OK, so I don't have a paper, but I remember my old Linux/P166 running great for a day or so when the CPU fan had died. I only noticed when I rebooted into Windows!

My notebook has a flaky RAM connection. 32 MB comes and goes depending on how the machine is squeezed. Win 9x products crash it hard, Linux and Win2k don't even notice.

So in my experience, Linux doesn't mind a hostile platform.

--
SYS 64738 NO CARRIER
IMHO by kmichels · 2003-12-27 01:21 · Score: 4, Informative

Nice to see some number coming in on Linux stability, although, as someone once said: there are lies, damn lies, and statistics. Anyone who has done any stats at University will know that you can prove anything from any result set. And as has already been pointed out by someone, the fact that it has been done by a company who has a not insubstantial vested interest in getting Linux into as many big companies as they can, it carries about as much credability as a Windoze security evaluation paper done by anyone other than Linus.

The very reason Linux has already made so many inroads into coporations in the first place is because of its reliability and stability, and not because some marketing campaign has churned out the words on header paper.

Another point is that I personally expect the sytems I administer to run for a darn side longer than 30, 60 or 90 days unless I need to restart them because of a kernel upgrade. When my last bunch I worked for went tits-up, our SAMBA file server had a 790 day uptime, and had run the SAMBA daemons reliably throughout, as well as doing internal DNS and DHCP. That's what your average Linux sysadmin expects from a Linux server box.

A Linux desktop being used for all manner of things though is completely another story: if I muck around with the Linux install on my laptop, as I do because that's what I do, then I expect to break it from time to time, and so "reliability" is not measured in the same way on a desktop/laptop system, IMHO.

The ideal environment for Linux is as a networked server, where it can get on with doing what it was setup to do, and will continue doing so until someone pulls the power plug on it. In that context, there are few OS's playing on the same field that can rival it for reliability and stability.
Re:USE BAD HARDWARE! by Anonymous Coward · 2003-12-27 03:25 · Score: 1, Informative

Keep in mind that Dell and Gateway's PC's are good because the hardware they choose to include is good. Anyone can buy the same hardware.
Dell and Gateway tend to use proprietary parts, usually mobo, psu, and RAM. So you can't really make the same computer.
Don't Trust IBM to be objective either - failures by Anonymous Coward · 2003-12-27 06:11 · Score: 1, Informative

The 2.4 kernel has a number of unstable algorithms. Most of the corresponding algorithms in genetic UNIX are stable by design. For example, read the bug postings on RedHat Bugzilla for this critical flaw which has failed to get fixed after more than 6 months: [Bug 89226] (VM)Kernel prefers swapping instead of releasing cache memory