The Leap Second Is Here! Are Your Systems Ready?
Tmack writes "The last time we had a leap second, sysadmins were taken a bit by surprise when a random smattering of systems locked up (including Slashdot itself) due to a kernel bug causing a race condition specific to the way leap seconds are handled/notified by ntp. The vulnerable kernel versions (prior to 2.6.29) are still common amongst older versions of popular distributions (Debian Lenny, RHEL/CentOS 5) and embedded/black-box style appliances (Switches, load balancers, spam filters/email gateways, NAS devices, etc). Several vendors have released patches and bulletins about the possibility of a repeat of last time. Are you/your team/company ready? Are you upgraded, or are you going to bypass this by simply turning off NTP for the weekend?"
Update: 07/01 03:14 GMT by S : ZeroPaid reports that this issue took down the Pirate Bay for a few hours.
Leap years = no problem.
Leap seconds = kernel panic.
I fear for teh internets if we try a leap millisecond.
I can see the fnords!
When NTP tries to say that it is 12:34:61 and the computer only expects 1-60.
For those wondering whether they get one more second of sleep tonight or one less, the rule is 'spring forwards, fall back, summer stand there looking confused'.
It actually goes 23h 59m 59s, 23h 59m 60s, 00h 00m 00s. See http://www.nist.gov/pml/div688/leapseconds.cfm
Poorly written software only expects seconds to go from 0-59. Positive leap seconds are counted 23:59:59 -> 23:59:60 -> 0:0:0. Leap seconds have been around since 1972, the same year Unix was rewritten in C. There's been plenty of time to get things right.
"National Security is the chief cause of national insecurity." - Celine's First Law
what about the metric time system?
Enjoy your free operating system that was stopped by an extra second.
Yes, because we've NEVER seen Windows have problems dealing with things like Daylight Savings...
#DeleteChrome
Perchance something like this example worked with existing deployed and tested code? http://www.ucolick.org/~sla/leapsecs/right+gps.html
Hello, Some of us code our systems somewhat like a finite state machine, and we figure our machine will never operate outside it.
,Jim
If you're testing if something that increments ever hits a number(like 10) and goes back to 0, instead of checking if it ==10, check if it is >9.
There are a lot of defensive coding mechanisms you can use. The downside of this is that when you debug, something can sneak by and put you outside of a state you want, so it makes it ever so slightly harder to debug. But if you're making software that will be used by the public that is hard to give updates, defensive programming can save the day here and there.
God spoke to me
Because that would be the opposite of a leap second maybe?
What a fool believes, he sees, no wise man has the power to reason away.
Our servers run on octal, you insensitive clod.
Yeah it had the wrong time but did not freeze up. What's your excuse?
You're really trying that hard to troll huh?
A free operating system has a bug in it so you want to exaggerate the existence of the bug to show that free operating systems are inferior in such a condescending and acerbic way.
I guess that can work. It's not like there is any paid OS out there that has decades long histories of serious instability, security flaws, and badly implemented ideas...... so yeah, you're completely safe making such an arrogant argument.
Windows: 95. Scene: LAN party. Game: Descent. Hilarity: All the Windows users cursing loudly as their computers spontaneously reboot for DST. DOS users get to feel smug for a change.
Windows has been boning DST as long as Windows has handled your RTC.
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
Oh I do hate when that happens at work. We end up so bewildered. "Are we dead? Or is this Ohio?"
( cookies for whoever gets the reference without Googling. )
Windows Azure is DOWN AS WE SPEAK: http://blogs.msdn.com/b/windowsazure/archive/2012/03/01/windows-azure-service-disruption-update.aspx ... congrats on paying for your non working OS without any indemnity either.
NO SIG
UTC is defined to be linked to Sol. It is used for things which depend on that characteristic (like astronomy and celestial navigation). If civil time doesn't need to be linked that closely, then it doesn't need to use UTC.
"National Security is the chief cause of national insecurity." - Celine's First Law
"Windows Azure is DOWN AS WE SPEAK"
What OS are you running, which thinks it's February 29, AS WE SPEAK?
"National Security is the chief cause of national insecurity." - Celine's First Law
Looks like Reddit's systems weren't ready for the leap second. It been down since around midnight (UTC). You'd think a site as big as that would be ready for such an event.
And which parallel universe did you crawl out of?
All my Java processes peg the CPU since the leap second, even if I restart them. Maybe a reboot will help...
So just like before, then?
Normally java is just in "waste memory" mode. Now it's "waste memory AND CPU".
Why would a Unix application ever see the :60? Any time someone checks the clock, the time should be derived from Unix time (seconds since the epoch) which doesn't account for leap seconds. So to an application it should appear as a duplicate :00 or :59.
If 2012 is a leap year, doesn't that make 2012-06-30 23:59 a leap minute?
https://twitter.com/redditstatus/status/219244389044731904 just said so -- "We are having some Java/Cassandra issues related to the leap second at 5pm PST. We're working as quickly as we can to restore service." :D
Ant(Dude) @ Quality Foraged Links (AQFL.net) & The Ant Farm (antfarm.ma.cx / antfarm.home.dhs.org).
The problem is not necessary in the time representation, the problem is that when NTP tries to insert the extra second into the kernel, the kernel gets stuck in a spinlock (basically waiting until a lock becomes available which never does).
The thing is that NTP announces this adjustment to the kernel somewhere the day before so it doesn't necessarily happen at 23:59:59 GMT (although it could happen at that time too)
Custom electronics and digital signage for your business: www.evcircuits.com
Unfortunately some GPS vendors don't get it right. I was testing conformance of a IT530 from FastRaxGPS which uses the Mediatek MT3339 receiver.
It put out the sequence 23:59:59, 23:59:59, 00:00:00 repeating second 59 instead of using second 60.
This is also an issue for software that works with GPS data and time. The GPS clocks do not "speeka da leapsecond" so the software needs to keep track of things. There was a 15 second offset, and now it's 16 seconds. This has happened often enough that most areas where this might have been a problem have been discovered, but as slashdotters know, there's new code written every second (even leap seconds), and it ain't all finest kind.
/etc/init.d/ntpd stop; date; date `date +"%m%d%H%M%C%y.%S"`; date;
Fixed the issues I was having. Credit goes to https://twitter.com/SilvioSantoZ/status/219250677522767872. I didn't have to restart anything after running it. YMMV
When NTP tries to say that it is 12:34:61 and the computer only expects 1-60.
That will never happen.
Leap seconds are always asserted at UTC midnight on the last day of a month. I think the convention is only to have leap second opportunities at the end of March, June, September and December. Typically, they try to assert it at midnight December 31. It's unusual to have a mid-year leap-second.
Since the normal progression is 23:59:58, 23:59:59, 00:00:00, the extra second makes the time 23:59:60. 61 would be TWO leap seconds which won't happen any time soon. The Earth's rate of rotation would have to change by nearly two seconds in 3 months.
$ man date
[...]
%S second (00..60)
[...]
Oh, maybe when you use that (or strftime).
So the comments are confusing to me as to whether Debian "squeeze" is supposed to have a problem or not, but I have about fifty of these systems running, and as far as I can tell, they're all fine.
I got a whole bunch of these in the logs:
> Jun 30 19:59:59 kernel: [timestamp] Clock: inserting leap second 23:59:60 UTC
I have three of the machines configured as NTP peers to each other, and looking at a few tier-1 time servers. The rest of the machines all use the three local peers as time servers.
My Debian desktop systems at home also seem to be fine.
2*3*3*3*3*11*251
I listened to the leap second on WWV. It sounded like this:
tick (23:59:55)
tick (23:59:56)
tick (23:59:57)
tick (23:59:58)
(nothing) (23:59:59)
(nothing) (23:59:60)
BEEP (00:00:00)
It always sounds to me like WWV has gotten stuck or something.
...laura
anything that runs its kernel on GPS time can give correct UTC time by following this prescription http://www.ucolick.org/~sla/leapsecs/right+gps.html
I was doing leap second testing in the last month and I'm pretty sure that date
returns
23:59:58
23:59:59
23:59:59
00:00:00
as you go through the leap second addition
(Un)fortunately, not at work so I can't double check but a quick look at the date source code suggests that this is indeed
its behaviour on Linux.
What the heck is the opposite of a leap second?
A leap anti-second?
A day is one Earth revolution, relative to Sol. It varies slightly because of a number of factors, and is called UT1. UT2R is a smoothed version, and but variations due to unpredictable events are left. UTC is based on the atomic second. The value chosen for the atomic second is such that, on average, there have been slightly more than 86400 of them in a day. So, just as a year is more than 365 days (a day is slightly shorter than 1/365 year), so an occasional leap day needs to be added, so to an occasional leap second is needed.
Contrary to what the GP said, the solar day is not too fast. It is what it is, by definition. Rather, the second is a bit too short.
On average, since the leap second was introduced in 1972, one has been needed about every 18 months. Over the long term, that rate will increase as tidal acceleration slows the earth. 1 sec/18 months ~= 2e-8, so that's how much the second has been off on average since 1972. The atomic value for the second is 9,192,631,770 periods of the radiation corresponding to the transition between the two hyperfine levels of the ground state of the caesium 133 atom. So, a better value might have been 9,192,631,967, which would make us about even to date. (Although, since leap seconds aren't distributed evenly, they would still have occurred, both positive and negative, just not as many.) The original value was based on measurements made over less than 3 years, and has worked for some shorter periods (there were no leap seconds between 1999 and 2004, for example), but the value chosen has proven to be too short over the 40 years of leap seconds.
"National Security is the chief cause of national insecurity." - Celine's First Law
It is perfectly valid for two back to back calls to gettimeofday to return the same value. It can happen at any time if the calls are closer together than the granularity of the time.
It is unusual, but perfectly possible for the second result to be less than the first if the clock has been reset. It is bad form for a program to panic over that. However, to try to avoid problems and to make logs a LOT less confusing, NTP prefers to slew the time rather than make hard adjustment. That is, it speeds up or slows down the system clock a bit so that system time and the reference time will converge.
Perhaps this is just affecting some kernel versions or specific applications which behave poorly.
One data point: both of my servers were running all night, with NTP updates, and did not appear to have any issues. Both are still running right now, several hours after the leap second. They're Synology boxes running their own version of Linux (DSM 3.1-1636 and DSM 4.0-2196). FWIW, the box running DSM 3.1 has never had a problem with leap seconds, and has endured several since we've had it running almost continuously[*] since 2007. Our desktop systems were not running because everyone was in bed, but those that have been used this morning were fine (Xubuntu 12.04, both the i386 and amd64 flavors).
[*] Since late 2007, it has been rebooted a few times for updates to the DSM system, once for upgrading its internal disks, and has been taken down several times when the length of a power outage exceeded 10 minutes, as our pathetic UPS will only keep the servers running for about 30 minutes. We're in a rural area, so the power is quite dodgy, especially in summer thunderstorms.
Those who can make you believe absurdities can make you commit atrocities. - Voltaire
...And some are not. Note to self: Do not take a holiday during a leap second!
I had 2 Debian Squeeze Blade servers in Thailand kernel panic on me at 3am (AEST). What strikes me as odd as out of the 6 blades that we have Debian running on (all running squeeze and kernel 2.6.32 with identical packages) only 2 of them had a Panic, and so much for the advisory saying it only affects kernel 2.6.29. There might be more to it than the kernel but sheesh, I'm on holiday!
Sleep second?
Casteism