Leap Second Bug Causes Crashes
An anonymous reader writes in with a Wired story about the problems caused by the leap second last night. "Reddit, Mozilla, and possibly many other web outfits experienced brief technical problems on Saturday evening, when software underpinning their online operations choked on the “leap second” that was added to the world’s atomic clocks. On Saturday, at midnight Greenwich Mean Time, as June turned into July, the Earth’s official time keepers held their clocks back by a single second in order to keep them in sync with the planet’s daily rotation, and according to reports from across the web, some of the net’s fundamental software platforms — including the Linux operating system and the Java application platform — were unable to cope with the extra second."
So far all I've heard about is affected Linux systems, did Windows and OS X just fine?
We will keep having these kinds of issues for as long as some people who fail to understand that time of day is an arbitrary number whose main utility lies in it being composed of predictable periods and divided into homogenous units. It should have no relation whatsoever to whatever time the sun happens to rise or set at any particular location and above all it should not be changed to accomodate fluctuations in the orbit of a rock circling an arbitrary star. Abominations like leap seconds or daylight savings make the whole system less useful by merely existing.
But personally I wouldn't be surprised if people off the equator were to get summer minutes composed of 120 seconds during daytime (or even better, a scale!) to ensure the sun rises and sets at the same time year around. Or, hey, why not simply make the seconds longer? Or a combination of both plus we can define pi to be 3 to make things simpler.
Why not bundle them and apply them every 10 or 20 years?
And apparently I'm not alone:
http://en.wikipedia.org/wiki/Leap_second#Proposal_to_abolish_leap_seconds
Hogwash, Astronomers can find coping mechanisms, it's either that or these ridiculous levels of stress for systems admins.
intellectual property law is philosophically incoherent. it is your moral duty to ignore it or sabotage it
I run Arch Linux with kernel 3.4.4 and it went haywire. My machine was very heavily loaded at the time and when the leap second happened mysqld, firefox, and ksoftirq processes started consuming 100% CPU. The load factor was well over 10 and the machine was grinding along. It didn't actually fail but it was loaded down.
Even restarting the processes didn't fix it. The high load would go away once I stopped the processes but as soon as I started them again the load would come right back. I had Firefox open on a blank page not doing anything and it was slammed at 100% CPU and had a could ksoftirq tasks slammed at 100% CPU each too.
I had to reboot the machine to get it back to normal.
I have Ubuntu and Debian servers that for whatever reason did not add the leap second so they were fine. Their time was a second off today though (at least until ntp slowly corrected it or I manually intervened).
I'm managing a cluster of 2,400 nodes running FreeBSD, and AFAICS, none was tripped off by leap second NTP adjustments. On the other hand, 4 out of 180 Linux nodes crashed simultaneously at that very moment. All this is exceedingly weird, but may indeed point to a subtle bug in the Linux kernel (only?). I've never witnessed this behavior in the past.
cpghost at Cordula's Web.
Google official blog: "Time, technology and leaping seconds" (sept 2011)
http://googleblog.blogspot.in/2011/09/time-technology-and-leaping-seconds.html
I wonder if the leap second has anything to do with the labs Chubby paper / site currently being offline..
Hivemind harvest in progress..
Our problem was with a third party monitoring solution - its daemon process brought every single one of our servers to a near halt by consuming all available cpu cycles at the stroke of gmt midnight.
The OS itself was fine.
This monitoring software is common enough that it likely was behind a lot of the issues seen around the 'net.
The hard system lock bug due to a leap second was patched in 2.6.29, so either you've got some weird related bug, or something is very wrong.
Well, the weird related bug would arguably count as something being wrong. Apparently there is a bug in the handling of the insertion of positive leap seconds that could cause weird behavior with futexes, and that bug appears not to have been fixed until at least July 1, 2012 (I'm guessing John Stultz has worked up a patch).