Bitten By the Red Hat Perl Bug
snydeq writes "Smart coders always optimize the slowest thing. But what if 'the slowest thing' is the code supplied by your vendor? That was exactly the situation Vipul Ved Prakash discovered when he tinkered with a company Linux box on which Perl code was running at least 100 times slower than expected. The code, he found, was running on CentOS Linux, using Perl packages built by Red Hat. So Prakash got rid of the Perl executable that came with CentOS, compiled a new one from stock, and the bug disappeared. 'What's more disturbing,' McAllister writes, 'is that this Red Hat Perl performance issue is a known bug,' first documented in 2006 on Red Hat's own Bugzilla database. Folks affected by the current bug have two options: sit tight, or compile the Perl interpreter from source — effectively waiving your support contract. If a Linux vendor can't provide comprehensive maintenance and support for the open source software projects you depend on, McAllister asks, who ever will?"
Installing your own perl under /usr/local, leaving the system one alone under /usr, that waives your support contract?
Seems unlikely, and if actually true, remarkably stupid.
(However, messing with the perl under /usr, that would be a mistake. It could easily break other things that depended on that specific version ...)
Yeah, well, good on mr Prakash I guess. Good thing he had the option of rebuilding from source, I can think of a few other operating systems and applications where that simply isn't an option.
So, score one for open source I guess, headline be damned.
"Total destruction the only solution" - Bob Marley
Just because Red Hat made one high-profile mistake, doesn't mean their support service is without value. Jump to conclusions much?
[Sir Garlon] is the marvellest knight that is now living, for he destroyeth many good knights, for he goeth invisible.
If it is "supplied by CentOS" then it was compiled by "CentOS" not Red Hat. Red Hat Enterprise Linux enterprise had a hotfix for this weeks ago. So if Vipul had been using a Red Hat product, he would not have had this problem.
...then what good is the support contract anyway? Either stop paying for it or make enough of a stink that they fix it. Of course, RedHat being an enormous company won't pay attention to anyone making a stink unless they are an enormous customer, which is a problem in both open-source and proprietary worlds. At least there's a workaround in the open-source world, one better than object patching the binary...
Well, I'm anything but a hardcore Perl hacker -- just use it to pragmatically list some rubbish now and then -- and I've never even heard of compiling your own Perl.
In truth, it's NOT like GCJ in the least. GCJ is a relatively immature JVM built from an entirely different codebase than the Sun JVM. "Vendor" Perl and "real" Perl ought to be substantially the same thing.
Just like all the foundation-level vendor tools, I would expect Perl to be built correctly on any official distro release. I shouldn't need to build my own GCC, my own Python, my own X, or my own Perl.
I have seen the future, and it is inconvenient.
Cent OS is *not* an OS that Red Hat provides support for. So, in terms of support, you get what you pay for. The bug is fixable by recompiling Perl? Great. Submit the fix to the maintainers. End of story.
But, supposing that you *did* pay for support and you ran into this problem... It's a known bug with low priority. So get them to fix it. You're paying for support. Hold your vendor to their promises.
And if they don't fix it, find another vendor. That's the beauty of open source. If you need support and your current supplier sucks, you can find another.
But it's completely disingenuous to complain that recompiling your Perl binary will void your support contract *when you have no such contract*.
here's an example of this on a mac OS 10.5 /tmp/foo
[CODE]
perl -we 'for (0..1000000) { print rand(1),"\n"}' >
time sort /tmp/foo
real 0m36.169s
user 0m35.731s
sys 0m0.264s
echo $LANG
en_US.UTF-8
setenv LANG C
time sort /tmp/foo > /tmp/foo2
0.966u 0.201s 0:01.27 91.3%
SO for a small file it's 40 times faster when not defaulting to UTF-8
Perl on leopard however appears to assume LANG=C bey default and is quite fast. As is grep.
Some drink at the fountain of knowledge. Others just gargle.
Actually, they do. yum is the default package manager in RHEL ever since version 4, I believe. In any case, you are correct that the repository that yum talks to contains the buggy Perl. It looks like this should be fixed for RHEL 5.3, but I wouldn't bet on it.
Well, I am harder core than the average schmo where Perl is concerned, so for me it's a requirement...The vendor version is always inferior. Most forums will tell you the same thing.
But like I said, if you don't really need it, it's fine. I doubt the average user would ever run into this problem.
ad logicam Claiming a proposition is false because it was presented as the conclusion of a fallacious argument.
Except that if they were paying Red Hat for support, they would have been given the hotfix when they called in to diagnose the problem.
It seems a little odd to complain about Red Hat support, when in fact they weren't paying Red Hat for support.
Let's keep our eye on the ball, here: this is a known bug, in Redhat's bug tracker, since 2006. Fixes have been commonplace since 2007, and only just now did Redhat get around to fixing the problem. The question remains: what good is Redhat over CentOS (the only difference being logos and a support contract) if they ignore a major performance bug for two years?
Please help metamoderate.
I don't know how many projects have asked Slashdot not to link to bugzilla. It makes the system unusable for the developers trying to get work done.
Here's the text currently in the bugzilla entry (edited to meet slashdot's filter requirements):
Bug 379791 - perl bless/overload performance problem
Summary: perl bless/overload performance problem
Status: VERIFIED
Product: Red Hat Enterprise Linux 5
Component: perl (Show Red Hat Enterprise Linux 5/perl bugs)
Version: 5.2
Platform: All Linux
Priority: urgent Severity: high
Target Milestone: rc
Assigned To: Marcela Maslanova
QA Contact: desktop-bugs@redhat.com
URL:
Whiteboard: GSSApproved
Keywords: ZStream
Depends on:
Blocks:
Reported: 2007-11-13 07:14 EDT by Nigel Metheringham
Modified: 2008-08-29 10:30 EDT (History)
Fixed In Version:
Release Notes:
Description From Nigel Metheringham 2007-11-13 07:14:04 EDT
RHEL5 perl shows the same performance issues as the Fedora 7 perl did - see
Bug #196836 and Bug #253728
This has been demonstrated in the recent perl update perl-5.8.8-10.el5_0.2
Same fix needs taking across to RHEL, ideally as a update release rather than
waiting for next major release cycle.
I do not have RHEL5.1 to test against right now, but the timing of the Fedora
fixes leads me to believe these would be much too late for the 5.1 release
cycle.
-- Comment #2 From Martin Kutter 2007-11-30 05:24:01 EDT --
The issue can be observed running the benchmark script from the recent
SOAP::WSDL package.
To do so, download SOAP-WSDL-2.00_24 (and its dependencies) from CPAN, run perl
Build.PL && perl Build, cd into benchmark and run perl -I../blib/lib 01_expat.t
This is the Output from RHEL4:
perl -I../lib 01_expat.t
Name "DB::packages" used only once: possible typo at 01_expat.t line 2.
Benchmark: timing 5000 iterations of Hash (SOAP:WSDL), XML::Simple (Hash), XSD
(SOAP::WSDL)...
Hash (SOAP:WSDL): 4 wallclock secs ( 3.48 usr + 0.01 sys = 3.49 CPU) @1432.66/s (n=5000)
XML::Simple (Hash): 7 wallclock secs ( 7.19 usr + 0.03 sys = 7.22 CPU) @692.52/s (n=5000)
XSD (SOAP::WSDL): 6 wallclock secs ( 6.06 usr + 0.01 sys = 6.07 CPU) @823.72/s (n=5000)
And this (with reduced n) is from RHEL5 (different machine, perl-5.8.8-10):
Benchmark: timing 500 iterations of Hash (SOAP:WSDL), XML::Simple (Hash), XSD
(SOAP::WSDL)...
Hash (SOAP:WSDL): 1 wallclock secs ( 0.59 usr + 0.00 sys = 0.59 CPU) @847.46/s (n=500)
XML::Simple (Hash): 1 wallclock secs ( 1.06 usr + 0.00 sys = 1.06 CPU) @471.70/s (n=500)
XSD (SOAP::WSDL): 11 wallclock secs (11.34 usr + 0.01 sys = 11.35 CPU) @44.05/s (n=500)
Increasing the number of runs shows the O(n^2) nature of the performance problem
- increasing the number of runs by a factor of 10 increases the runtime for the
XSD bench by a factor of nearly 100:
Name "DB::packages" used only once: possible typo at 01_expat.t line 2.
Benchmark: timing 5000 iterations of Hash (SOAP:WSDL), XML::Simple (Hash), XSD
(SOAP::WSDL)...
Hash (SOAP:WSDL): 6 wallclock secs ( 6.19 usr + 0.03 sys = 6.22 CPU) @ 803.86/s (n=5000)
XML::Simple (Hash): 11 wallclock secs (11.20 usr + 0.02 sys = 11.22 CPU) @ 445.63/s (n=5000)
XSD (SOAP::WSDL): 851 wallclock secs (847.36 usr + 2.28 sys = 849.64 CPU) @ 5.88/s (n=5000)
-- Comment #3 From RHEL Product and Program Management 2007-12-03 15:47:35 EDT --
This request was evaluated by Red Hat Product Management for
inclusion, but this component is not scheduled to be updated in
the cur
... so it's easy to see how someone might be lazy and just use whatever the vendor supplies.
Wow, I wouldn't want you managing my servers.
Home compiled software can easily be a source of security holes, as tracking what you have compiled versus what is patched using vendor updates adds significant management overhead and another point of failure. As an example, a popular open source project had its website compromised because the maintainer used a self compiled version of CVS, and forgot about it. Had the maintainer used a vendor CVS, the security hole would have been patched by the vendor update.
Lazy is good. If you can do the job and be lazy, that is a win-win.
I wonder what went wrong with the RH release?
The RH bugzilla ticket indicates this as the initial issue.
http://rt.perl.org/rt3/Public/Bug/Display.html?id=34925
So it appears RH have not applied this fix, perhaps because the patch is included with a more cutting edge perl than is considered safe for inclusion with RHEL. Certainly, it looks like it was fixed in perl 5.9, but that may be an experimental branch more akin to the old 2.[135] linux kernels (and no vendor would have touched them in an enterprise targeted distribution.)
The vendor version is always inferior.
The vendor version in this case has a bug fixed. The bug caused incorrect behaviour. In this case the vendor version is only inferior if you prefer fast but incorrect results. There isn't anything wrong with preferring fast incorrect results over slow correct results, but most people probably want slow and correct to be the default if given the choice.
Fast and correct always wins, and the real Perl hackers are working on that. In the meantime we take what we can get.
Finally! A year of moderation! Ready for 2019?
In general, that could be said of (almost) every single software package of any substance. If you want it to run well, you have got to roll your own. Well, almost. In theory, there's nothing to stop a vendor with a decent server farm gathering your system info then compiling the RPMs for you for that specific system, and/or having a set of stock ISOs rolled for, say, the five most-popular systems.
A central build has the advantage over something like Gentoo in that vendors can usually afford better horsepower and can auto-tune the options per application better than the average Joe could ever hand-tune them. It's also more supportable, as the vendor then has the exact information for each package, as they would have had they rolled the RPMs from their own default configurations, which a totally user-defined setup would deprive them of.
There's also the dependency hell and the namespace clash that "standard" distros suffer from all the time. I've never come across a distribution YET that supplies stock binaries that is capable of supplying ones that actually work together. First rule, guys, is that if package A is rebuilt, ALL packages in which A is directly or indirectly a dependency should automatically be rebuilt. It should not be possible, using JUST the stable packages (DEB or RPM), to get into a situation where packages barf or cannot resolve their own dependency requirements.
(I won't even get into the issue of broken packages in the stable updates, where you cannot complete an install or a deinstall because the flippin' scripts don't work and don't cleanly handle the case where something breaks, effectively barfing over the disk and the package database.)
The more I use Linux, the more I am convinced that the distros out there are operating on a flawed assumption. They work great, but only when that assumption holds true, but will catastrophically fail outside of those bounds. This assumption is that people will use the distro with a relatively narrow aim on relatively generic systems.
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
I disagree. You pay Red Hat to provide a baseline server with all of the provided applications and languages. You pay Red Hat to provide timely updates for their distribution.
You pay your 15+ staff to work on a custom application that may depend on something Red Hat provides.
Now you may need to custom compile something as a short-term solution, but if Red Hat can't do something as simple as keeping their Perl interpreter working correctly I would seriously reconsider paying them any more money.
Think about it. If your paid staff has to make a custom compilation of a vendor supplied application (and consequently keep it updated) then what are you paying Red Hat for? The days of paying Red Hat out of the goodness of our hearts are over. It's time for Red Hat to act like one of the big boys and earn their money.
These comments are my own and do not necessarily reflect the views or opinions of my employer or colleagues...
No released version of Perl ever had this bug. Red Hat pulled a patch from a development version of Perl and maintained it over released versions of Perl which did not need it. That's the source of this bug. The Perl developers fixed this bug before releasing the next stable version of Perl.
how to invest, a novice's guide
> and I've never even heard of compiling your own Perl.
I've been doing this for a very long time, in addition to compiling my own php, apache etc.
If it's exposed, I compile it. If a 0day hits, I don't have the luxury of leaving the fate of my production security in some vendor's hands.
At the negligence trial, where I'm being prosecuted because my box got pwned, the plaintiff's attorney is going to ask why I didn't fix it if I could. I can, so I do.
If you are moving from packages to compiled stuff, it takes a lot longer than it does if you already operate this way.
The last SSL worm beat me over the head with the importance of this.
While compiling doesn't make you more secure, it sure as hell allows you to become secure faster *when* something happens and the vendors drag their feet. It beats unplugging your servers until a fix is available, which was my other option when this happened.
-Viz
Don't kid yourself. It's the size of the regexp AND how you use it that counts.
There isn't anything wrong with preferring fast incorrect results over slow correct results, but most people probably want slow and correct to be the default if given the choice.
Well, I'd be a bit careful about making such general statements. There is evidence that people aren't generally that intelligent.
I remember back in the 1970s, when I was at a large university that shall remain unnamed, and a bunch of CS people did a detailed study of the Fortran that accounted for fully half the runs on the campus's central mainframe (which shall also remain unnamed). They found that fully half the runs produced at least some incorrect output due to undetected integer overflows. The hardware gave interrupts for floating-point overflows, but for integers, it just set a flag bit, and you needed to test that flag to catch overflows. The compiler had an option to generate such tests, but it was off by default. The vendor said they did this because they had found that most customers preferred faster code.
The local gang didn't believe this, so they did a bit of a survey. They asked lots of users of the Fortran code whether they would prefer their programs to catch all arithmetic errors if this meant that the code ran slower, or if they would prefer faster code that sometimes didn't catch errors. Roughly 90% of the people they asked this said that they'd want the faster code. Later on, I ran across references to similar tests at other schools, with similar results.
Personally, I was shocked by this. This mainframe was used to do the computing for most of the scientific work on campus, and scientific computing was almost entirely done in Fortran. So half their data runs had undetected incorrect output. They now knew this, and they still preferred the faster speed to correct output.
Somehow, I suspect that this situation hasn't changed. I've dug into various programming languages since then, to learn how they handle this and other potential sources of erroneous results. Most current languages still ignore things like overflow flags by default. Some have no way of enabling the tests of such flags.
Yes, I know lots of ways of explicitly testing for such errors myself. I've done it a lot, because I know I can't rely on others to enable the builtin tests (when they exist) when they recompile the code. But when looking at other people's code, I almost never see anything that will detect overflows. When you're N levels deep in function calls, you usually have no way of verifying the possible range of the current function's args, so there's no way of proving that an overflow can't happen.
Sometimes I'm amazed that our systems run as well as they do, given this sort of nonchalant attitude towards known sources of hardware errors. And I do a lot of paranoid, defensive programming, even though I know that my employers probably don't want it because it slows down the software.
Those who do study history are doomed to stand helplessly by while everyone else repeats it.
Reminds me something I heard a wise hacker say once, when someone tried to convince him that their new version of some code was better that his, because it ran in 10% of the time his did but produced (slightly) wrong results in a few cases...
"If it doesn't have to produce correct results, I can make my version use no memory and run in zero time."
Why doesn't the gene pool have a life guard?