Bitten By the Red Hat Perl Bug
snydeq writes "Smart coders always optimize the slowest thing. But what if 'the slowest thing' is the code supplied by your vendor? That was exactly the situation Vipul Ved Prakash discovered when he tinkered with a company Linux box on which Perl code was running at least 100 times slower than expected. The code, he found, was running on CentOS Linux, using Perl packages built by Red Hat. So Prakash got rid of the Perl executable that came with CentOS, compiled a new one from stock, and the bug disappeared. 'What's more disturbing,' McAllister writes, 'is that this Red Hat Perl performance issue is a known bug,' first documented in 2006 on Red Hat's own Bugzilla database. Folks affected by the current bug have two options: sit tight, or compile the Perl interpreter from source — effectively waiving your support contract. If a Linux vendor can't provide comprehensive maintenance and support for the open source software projects you depend on, McAllister asks, who ever will?"
Installing your own perl under /usr/local, leaving the system one alone under /usr, that waives your support contract?
Seems unlikely, and if actually true, remarkably stupid.
(However, messing with the perl under /usr, that would be a mistake. It could easily break other things that depended on that specific version ...)
Yeah, well, good on mr Prakash I guess. Good thing he had the option of rebuilding from source, I can think of a few other operating systems and applications where that simply isn't an option.
So, score one for open source I guess, headline be damned.
"Total destruction the only solution" - Bob Marley
Just because Red Hat made one high-profile mistake, doesn't mean their support service is without value. Jump to conclusions much?
[Sir Garlon] is the marvellest knight that is now living, for he destroyeth many good knights, for he goeth invisible.
If it is "supplied by CentOS" then it was compiled by "CentOS" not Red Hat. Red Hat Enterprise Linux enterprise had a hotfix for this weeks ago. So if Vipul had been using a Red Hat product, he would not have had this problem.
Well, I'm anything but a hardcore Perl hacker -- just use it to pragmatically list some rubbish now and then -- and I've never even heard of compiling your own Perl.
In truth, it's NOT like GCJ in the least. GCJ is a relatively immature JVM built from an entirely different codebase than the Sun JVM. "Vendor" Perl and "real" Perl ought to be substantially the same thing.
Just like all the foundation-level vendor tools, I would expect Perl to be built correctly on any official distro release. I shouldn't need to build my own GCC, my own Python, my own X, or my own Perl.
I have seen the future, and it is inconvenient.
Cent OS is *not* an OS that Red Hat provides support for. So, in terms of support, you get what you pay for. The bug is fixable by recompiling Perl? Great. Submit the fix to the maintainers. End of story.
But, supposing that you *did* pay for support and you ran into this problem... It's a known bug with low priority. So get them to fix it. You're paying for support. Hold your vendor to their promises.
And if they don't fix it, find another vendor. That's the beauty of open source. If you need support and your current supplier sucks, you can find another.
But it's completely disingenuous to complain that recompiling your Perl binary will void your support contract *when you have no such contract*.
Well, I am harder core than the average schmo where Perl is concerned, so for me it's a requirement...The vendor version is always inferior. Most forums will tell you the same thing.
But like I said, if you don't really need it, it's fine. I doubt the average user would ever run into this problem.
ad logicam Claiming a proposition is false because it was presented as the conclusion of a fallacious argument.
Let's keep our eye on the ball, here: this is a known bug, in Redhat's bug tracker, since 2006. Fixes have been commonplace since 2007, and only just now did Redhat get around to fixing the problem. The question remains: what good is Redhat over CentOS (the only difference being logos and a support contract) if they ignore a major performance bug for two years?
Please help metamoderate.
I don't know how many projects have asked Slashdot not to link to bugzilla. It makes the system unusable for the developers trying to get work done.
Here's the text currently in the bugzilla entry (edited to meet slashdot's filter requirements):
Bug 379791 - perl bless/overload performance problem
Summary: perl bless/overload performance problem
Status: VERIFIED
Product: Red Hat Enterprise Linux 5
Component: perl (Show Red Hat Enterprise Linux 5/perl bugs)
Version: 5.2
Platform: All Linux
Priority: urgent Severity: high
Target Milestone: rc
Assigned To: Marcela Maslanova
QA Contact: desktop-bugs@redhat.com
URL:
Whiteboard: GSSApproved
Keywords: ZStream
Depends on:
Blocks:
Reported: 2007-11-13 07:14 EDT by Nigel Metheringham
Modified: 2008-08-29 10:30 EDT (History)
Fixed In Version:
Release Notes:
Description From Nigel Metheringham 2007-11-13 07:14:04 EDT
RHEL5 perl shows the same performance issues as the Fedora 7 perl did - see
Bug #196836 and Bug #253728
This has been demonstrated in the recent perl update perl-5.8.8-10.el5_0.2
Same fix needs taking across to RHEL, ideally as a update release rather than
waiting for next major release cycle.
I do not have RHEL5.1 to test against right now, but the timing of the Fedora
fixes leads me to believe these would be much too late for the 5.1 release
cycle.
-- Comment #2 From Martin Kutter 2007-11-30 05:24:01 EDT --
The issue can be observed running the benchmark script from the recent
SOAP::WSDL package.
To do so, download SOAP-WSDL-2.00_24 (and its dependencies) from CPAN, run perl
Build.PL && perl Build, cd into benchmark and run perl -I../blib/lib 01_expat.t
This is the Output from RHEL4:
perl -I../lib 01_expat.t
Name "DB::packages" used only once: possible typo at 01_expat.t line 2.
Benchmark: timing 5000 iterations of Hash (SOAP:WSDL), XML::Simple (Hash), XSD
(SOAP::WSDL)...
Hash (SOAP:WSDL): 4 wallclock secs ( 3.48 usr + 0.01 sys = 3.49 CPU) @1432.66/s (n=5000)
XML::Simple (Hash): 7 wallclock secs ( 7.19 usr + 0.03 sys = 7.22 CPU) @692.52/s (n=5000)
XSD (SOAP::WSDL): 6 wallclock secs ( 6.06 usr + 0.01 sys = 6.07 CPU) @823.72/s (n=5000)
And this (with reduced n) is from RHEL5 (different machine, perl-5.8.8-10):
Benchmark: timing 500 iterations of Hash (SOAP:WSDL), XML::Simple (Hash), XSD
(SOAP::WSDL)...
Hash (SOAP:WSDL): 1 wallclock secs ( 0.59 usr + 0.00 sys = 0.59 CPU) @847.46/s (n=500)
XML::Simple (Hash): 1 wallclock secs ( 1.06 usr + 0.00 sys = 1.06 CPU) @471.70/s (n=500)
XSD (SOAP::WSDL): 11 wallclock secs (11.34 usr + 0.01 sys = 11.35 CPU) @44.05/s (n=500)
Increasing the number of runs shows the O(n^2) nature of the performance problem
- increasing the number of runs by a factor of 10 increases the runtime for the
XSD bench by a factor of nearly 100:
Name "DB::packages" used only once: possible typo at 01_expat.t line 2.
Benchmark: timing 5000 iterations of Hash (SOAP:WSDL), XML::Simple (Hash), XSD
(SOAP::WSDL)...
Hash (SOAP:WSDL): 6 wallclock secs ( 6.19 usr + 0.03 sys = 6.22 CPU) @ 803.86/s (n=5000)
XML::Simple (Hash): 11 wallclock secs (11.20 usr + 0.02 sys = 11.22 CPU) @ 445.63/s (n=5000)
XSD (SOAP::WSDL): 851 wallclock secs (847.36 usr + 2.28 sys = 849.64 CPU) @ 5.88/s (n=5000)
-- Comment #3 From RHEL Product and Program Management 2007-12-03 15:47:35 EDT --
This request was evaluated by Red Hat Product Management for
inclusion, but this component is not scheduled to be updated in
the cur
The vendor version is always inferior.
The vendor version in this case has a bug fixed. The bug caused incorrect behaviour. In this case the vendor version is only inferior if you prefer fast but incorrect results. There isn't anything wrong with preferring fast incorrect results over slow correct results, but most people probably want slow and correct to be the default if given the choice.
Fast and correct always wins, and the real Perl hackers are working on that. In the meantime we take what we can get.
Finally! A year of moderation! Ready for 2019?
No released version of Perl ever had this bug. Red Hat pulled a patch from a development version of Perl and maintained it over released versions of Perl which did not need it. That's the source of this bug. The Perl developers fixed this bug before releasing the next stable version of Perl.
how to invest, a novice's guide
There isn't anything wrong with preferring fast incorrect results over slow correct results, but most people probably want slow and correct to be the default if given the choice.
Well, I'd be a bit careful about making such general statements. There is evidence that people aren't generally that intelligent.
I remember back in the 1970s, when I was at a large university that shall remain unnamed, and a bunch of CS people did a detailed study of the Fortran that accounted for fully half the runs on the campus's central mainframe (which shall also remain unnamed). They found that fully half the runs produced at least some incorrect output due to undetected integer overflows. The hardware gave interrupts for floating-point overflows, but for integers, it just set a flag bit, and you needed to test that flag to catch overflows. The compiler had an option to generate such tests, but it was off by default. The vendor said they did this because they had found that most customers preferred faster code.
The local gang didn't believe this, so they did a bit of a survey. They asked lots of users of the Fortran code whether they would prefer their programs to catch all arithmetic errors if this meant that the code ran slower, or if they would prefer faster code that sometimes didn't catch errors. Roughly 90% of the people they asked this said that they'd want the faster code. Later on, I ran across references to similar tests at other schools, with similar results.
Personally, I was shocked by this. This mainframe was used to do the computing for most of the scientific work on campus, and scientific computing was almost entirely done in Fortran. So half their data runs had undetected incorrect output. They now knew this, and they still preferred the faster speed to correct output.
Somehow, I suspect that this situation hasn't changed. I've dug into various programming languages since then, to learn how they handle this and other potential sources of erroneous results. Most current languages still ignore things like overflow flags by default. Some have no way of enabling the tests of such flags.
Yes, I know lots of ways of explicitly testing for such errors myself. I've done it a lot, because I know I can't rely on others to enable the builtin tests (when they exist) when they recompile the code. But when looking at other people's code, I almost never see anything that will detect overflows. When you're N levels deep in function calls, you usually have no way of verifying the possible range of the current function's args, so there's no way of proving that an overflow can't happen.
Sometimes I'm amazed that our systems run as well as they do, given this sort of nonchalant attitude towards known sources of hardware errors. And I do a lot of paranoid, defensive programming, even though I know that my employers probably don't want it because it slows down the software.
Those who do study history are doomed to stand helplessly by while everyone else repeats it.
Reminds me something I heard a wise hacker say once, when someone tried to convince him that their new version of some code was better that his, because it ran in 10% of the time his did but produced (slightly) wrong results in a few cases...
"If it doesn't have to produce correct results, I can make my version use no memory and run in zero time."
Why doesn't the gene pool have a life guard?