The Leap Second Is Here! Are Your Systems Ready?
Tmack writes "The last time we had a leap second, sysadmins were taken a bit by surprise when a random smattering of systems locked up (including Slashdot itself) due to a kernel bug causing a race condition specific to the way leap seconds are handled/notified by ntp. The vulnerable kernel versions (prior to 2.6.29) are still common amongst older versions of popular distributions (Debian Lenny, RHEL/CentOS 5) and embedded/black-box style appliances (Switches, load balancers, spam filters/email gateways, NAS devices, etc). Several vendors have released patches and bulletins about the possibility of a repeat of last time. Are you/your team/company ready? Are you upgraded, or are you going to bypass this by simply turning off NTP for the weekend?"
Update: 07/01 03:14 GMT by S : ZeroPaid reports that this issue took down the Pirate Bay for a few hours.
Leap years = no problem.
Leap seconds = kernel panic.
I fear for teh internets if we try a leap millisecond.
I can see the fnords!
NTP says it's 12:34:56pm, then it's 12:34:56pm. Why would a leap second lock things up any more than a clock that's one second slow and is corrected by NTP?
I am okay with it !!
For those wondering whether they get one more second of sleep tonight or one less, the rule is 'spring forwards, fall back, summer stand there looking confused'.
Will this affect desktop distros such as Ubuntu? Seems like a few Debian based servers have crashed. http://serverfault.com/questions/403732/anyone-else-experiencing-high-rates-of-linux-server-crashes-today
what about the metric time system?
Leap seconds have no place in Unix time. They make the assumption that time moves forward invalid (a subsequent call to gettimeofday may return a number the previous one), which is the cause of oncountable bugs. Raise your hand if you didn't know this is how it works.
Why don't they put leap second handling in the layer that converts from Unix time to user-visible representations, like timezones, DST and, oh I don't know, leap DAYS?
Enjoy your free operating system that was stopped by an extra second.
Yes, because we've NEVER seen Windows have problems dealing with things like Daylight Savings...
#DeleteChrome
Hello, Some of us code our systems somewhat like a finite state machine, and we figure our machine will never operate outside it.
,Jim
If you're testing if something that increments ever hits a number(like 10) and goes back to 0, instead of checking if it ==10, check if it is >9.
There are a lot of defensive coding mechanisms you can use. The downside of this is that when you debug, something can sneak by and put you outside of a state you want, so it makes it ever so slightly harder to debug. But if you're making software that will be used by the public that is hard to give updates, defensive programming can save the day here and there.
God spoke to me
If the ITU never inserted another leap second again, just letting UTC sloooowly diverge from solar time, it would not create any real problem for hundreds of years. And by that time either we'll either have developed the technology to adjust the Earth's rotation itself to correct the discrepancy, or else civilization will have been destroyed by nuclear war/global warming/etc. Either way we wouldn't have to worry about leap seconds.
Our servers run on octal, you insensitive clod.
At my shop, we did a dry run a couple of weeks ago. Things went very well and we had no issues. That said, I still unplugged the time server from the network this weekend, just in case. I really don't want to get a call from one of the far flung server admins, telling me something went wrong. The risk of issues caused by losing a few seconds across the network is much less than the potential damage from even one server doing the wrong thing.
Being UTC+10 means that I read this story and get excited about watching the leap second, only to discover it happened last night. I guess because I didn't notice means that I was ready.
kers at the wrong moment What happens when you catch stock tic
Yeah it had the wrong time but did not freeze up. What's your excuse?
You're really trying that hard to troll huh?
A free operating system has a bug in it so you want to exaggerate the existence of the bug to show that free operating systems are inferior in such a condescending and acerbic way.
I guess that can work. It's not like there is any paid OS out there that has decades long histories of serious instability, security flaws, and badly implemented ideas...... so yeah, you're completely safe making such an arrogant argument.
Windows: 95. Scene: LAN party. Game: Descent. Hilarity: All the Windows users cursing loudly as their computers spontaneously reboot for DST. DOS users get to feel smug for a change.
Windows has been boning DST as long as Windows has handled your RTC.
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
Oh I do hate when that happens at work. We end up so bewildered. "Are we dead? Or is this Ohio?"
( cookies for whoever gets the reference without Googling. )
When daylight savings got shifted to the current longer format - which was within this past decade, mind you - millions of Outlook users discovered that many of their already-existing appointments were shifted by one hour. And, unless you knew when the appointment was created, you had no way of knowing the reported time was correct, or if it was off by an hour.
People had to deal with this for three bloody weeks, until the calendar ticked over to the "old" daylight savings start date. That's a bit more hassle than rebooting a Unix machine ever was.
#DeleteChrome
"I have no sense of humor and will butthurt for Linux." - EdIII (1114411)
Windows Azure is DOWN AS WE SPEAK: http://blogs.msdn.com/b/windowsazure/archive/2012/03/01/windows-azure-service-disruption-update.aspx ... congrats on paying for your non working OS without any indemnity either.
NO SIG
All my Java processes peg the CPU since the leap second, even if I restart them. Maybe a reboot will help...
"Windows Azure is DOWN AS WE SPEAK"
What OS are you running, which thinks it's February 29, AS WE SPEAK?
"National Security is the chief cause of national insecurity." - Celine's First Law
Looks like Reddit's systems weren't ready for the leap second. It been down since around midnight (UTC). You'd think a site as big as that would be ready for such an event.
And which parallel universe did you crawl out of?
...The difference is that Slashdot and it's users have spent the last decade crucifying Windows for things like that while exalting Linux as superior.
My server inserted the leap second around 90 minutes ago. Google Chrome 20.x, MySQL 5.5, and MythTV 0.25 on systems NTP synced to the server all did strange and wonderful things simultaneously. Yes, every system was completely up to date.
Anyone care to make this priceless moment from dmesg into a MC commercial?
Clock: inserting leap second 23:59:60 UTC
chrome[19727]: segfault at 0 ip (null) sp 00007fff32ef9b88 error 14
chrome[31567]: segfault at 27a00000000 ip 0000027a00000000 sp 00007fff32ef9b78 error 14
that thinks this whole thing is stupid... Go back a second by making one of the minutes have 61 seconds? WTF kind of solution is that? Why TF don't they just go 57...58...59...59...00. Or just stop the clock 57...58...5900
Don't even get me started on "Oh it's best to do it at midnight, in June or August". What what WHAT???? It's a fucking second! A second! Do it any time! Who decided "midnight" was best? Midnight GMT is fucking "wake up and go to work" in Australia and the middle of the afternoon in California - WHOSE midnight is supposed to be the best time to do it...
Fuck me, we have leap years with +1 extra days, we have +/-1 leap hours (DST, BST, whatever you want to call it), and we cope - yet these people are befuddled by a leap second.
Get a fucking grip...
If 2012 is a leap year, doesn't that make 2012-06-30 23:59 a leap minute?
Try and buy an app from your iphone NOW.
NO SIG
https://twitter.com/redditstatus/status/219244389044731904 just said so -- "We are having some Java/Cassandra issues related to the leap second at 5pm PST. We're working as quickly as we can to restore service." :D
Ant(Dude) @ Quality Foraged Links (AQFL.net) & The Ant Farm (antfarm.ma.cx / antfarm.home.dhs.org).
Congrats on your epic reading comprehension/what-day-is-today fail.
Finally Windows users have something to feel smug over! Maybe in another decade you'll have a second!
wrong blog post, but it is down. The apple appstore runs on it and it doesnt work.
NO SIG
...at least not on any of my servers, so what's a leap second between friends?
Yes, and Netflix was out for a lot of folks yesterday and today even though alot of folks pay for it, because of massive power outages due to storms. Are you going to bitch about that too because, "OMG, an Internet service went down!"
What's that got to due with a bug in an OS? Your story comprehension is as good as your reading comprehension.
Looks like 4chan was affected by the leap second too. https://twitter.com/#!/4chan
shut up bitch
After leap second, all programs doing pthread_cond_timedwait() were turned into busy wait loops (google chrome, mozilla thunderbird, others). Restarting programs didn't help.
Actually, nothing has ever had a problem with Daylight Savings Time. Many systems are, however, affected by Daylight Saving Time.
They've been "Down for Emergency Maintenance" for quite some time now.
/r/leapsecond subreddit that covers this.
One would think they'd have seen this coming, because I'm pretty sure there's a
West by about 800'
damaged by dogma
http://lkml.indiana.edu/hypermail/linux/kernel/1206.3/03186.html
This is also an issue for software that works with GPS data and time. The GPS clocks do not "speeka da leapsecond" so the software needs to keep track of things. There was a 15 second offset, and now it's 16 seconds. This has happened often enough that most areas where this might have been a problem have been discovered, but as slashdotters know, there's new code written every second (even leap seconds), and it ain't all finest kind.
/etc/init.d/ntpd stop; date; date `date +"%m%d%H%M%C%y.%S"`; date;
Fixed the issues I was having. Credit goes to https://twitter.com/SilvioSantoZ/status/219250677522767872. I didn't have to restart anything after running it. YMMV
So the comments are confusing to me as to whether Debian "squeeze" is supposed to have a problem or not, but I have about fifty of these systems running, and as far as I can tell, they're all fine.
I got a whole bunch of these in the logs:
> Jun 30 19:59:59 kernel: [timestamp] Clock: inserting leap second 23:59:60 UTC
I have three of the machines configured as NTP peers to each other, and looking at a few tier-1 time servers. The rest of the machines all use the three local peers as time servers.
My Debian desktop systems at home also seem to be fine.
2*3*3*3*3*11*251
Better yet, check if it is >= 10.
POSIX has several clocks. The issue here, is that as usual programmers are mostly code monkeys that know little of the insides of whatever their shit will run on. They don't know jack shit about the OS, they don't even know the libc and the C standard well, let alone the hardware. Wimps. Let's not go into the programmers that kid themselves by hiding behind "high level" stuff, which could only work if a very competent team of proper engineers was behind that "high level stuff", which is almost never the case.
POSIX has something called the MONOTONIC clock (which as you can well guess from the name, never does anything surprising, and it is the ONLY clock you should ever use to compute time deltas). But most people out there will use the wall clock, i.e. something that ends up being equivalent to gettimeofday() instead of something that maps to clock_gettime(CLOCK_MONOTONIC). Which is idiotic at best.
That still won't save you from the kernel going bonkers when it has to process a leap second, or sudden outbreaks of imbecility in an otherwise sane land (some of the futex userspace libraries, which _are_ written by people that DO know their shit, still managed to get leap seconds wrong!).
such a low UID and still trolling.... theres something to be said there
have you seen my sig? there are many others like it but none that are the same
unless it gets lept ;)
have you seen my sig? there are many others like it but none that are the same
My Exede satellite internet service was out from 8:00 EDT to 9:50 EDT... I have no way to verify it was caused by the leap second, but it seems a little coincidental.
There's a reason for leap seconds, and Astronomers will go postal on you if you try to make your clocks too dumb so they no longer track the stars. If you can't implement leap seconds out of respect, do it out of fear, okay?
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
Windows: 95. Scene: LAN party. Game: Descent. Hilarity: All the Windows users cursing loudly as their computers spontaneously reboot for DST. DOS users get to feel smug for a change.
Windows has been boning DST as long as Windows has handled your RTC.
If Win95 automatically rebooted after becoming completely unusable that would have been a "feature".
Seriously, folks would have paid money to have it skip the BSOD and just reboot and startup some programs that were running before.
There are different layers where you can run into problems. One of them is the ASCII value a time server hands a a time client - if it's 23h 59m 60s and the client chokes, that's a client problem. If the client tries to set an ASCII clock in the kernel to 23h 59m 60s, and the kernel chokes, that's a kernel bug. If a Unix application library can't cope with the interesting values, that's a library problem.
One obvious workaround is for the NTP server to never answer 23h 59m 60s, and for NTP clients to never tell the Unix kernel that that's what time it is. The way you implement this is simply to detect that it's about to be a leap second and Don't Respond until after it's over. (If your client can't cope with not getting a response from the server, the client's broken anyway.)
On the other hand, if the Unix kernel can't cope with a timeclock being set to 23h 59m 60s, that's a bug that should have been fixed years ago - it's not like leap seconds are a new thing.
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
It was also fixed by applying a patch that did not require a reboot. So the only hassle was ignorance and not knowing that a simple patch available from MS would fix the problem.
Your example is only 'more hassle' because you don't know what you're doing and instead of taking the normal sane route, you just bitch about it.
Ironically, you talk about patching Linux and rebooting but entirely ignore doing the same on a windows machine. Do your desktops need 24x7 uptime and your servers don't or something?
Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
wrong blog post, but it is down. The apple appstore runs on it and it doesnt work.
Bwahhahhaha
Did you seriously just claim the Apple app store runs on Azure? Are you fucking retarded?
Perhaps you mean Amazon.
Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
And in addition, if you are dealing with time, you MUST prepare for the case where the clock moves around randomly, both forward and backward. Because if your code runs for a long time, it will.
"First they came for the slanderers and i said nothing."
I listened to the leap second on WWV. It sounded like this:
tick (23:59:55)
tick (23:59:56)
tick (23:59:57)
tick (23:59:58)
(nothing) (23:59:59)
(nothing) (23:59:60)
BEEP (00:00:00)
It always sounds to me like WWV has gotten stuck or something.
...laura
Simple version:
"dont kill the messenger" except when the messenger is going to kill you. Its printk sending notice that the leap second happened that deadlocks against the timer doing the leap second (both vying for xtime_lock). Call it a "feature" of the NTP code. Hence the "turn off NTPD" workaround, if NTP doesnt get notified it should implement the leap second from somewhere upstream, it wont notify about it to the kernel, and the printk shouldnt happen.
-T
Support TBI Research: http://www.raisinhope.org
Hey, Sergey Brin: maybe you should take this as a reminder that it sure would be nice if Android devices actually took leap-seconds into account instead of setting themselves to GPS time. My phone now thinks it's 16 seconds in the future compared to every sane electronic system. Sooner or later, that's going to cause problems for certain types of encryption.
date --set `date | awk '{print $4}'`
I can't say if this is just a coincidence or connected with the leap-second, just yet.
Bruce
Bruce Perens.
Has been for quite some time already. This is ancient stuff. Now routers on the other hand...
The timing was coincident with localtime leap-second, anyway (system clock is UTC). After rebooting the machine things look fine.
My God, it's Full of Source!
OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
All leap seconds are "back", i.e. one more second in the day. This is because the earth's rotation is slowing down (mostly due to tidal friction). When they defined the SI second in terms of atomic vibrations they used an average of day lengths over several centuries. But that meant that it was "wrong" from the moment they set it in 1961. The actual solar day was already longer than 86400 SI seconds by 1-2ms and keeps getting longer by 1.7ms per century.
Perhaps this is just affecting some kernel versions or specific applications which behave poorly.
One data point: both of my servers were running all night, with NTP updates, and did not appear to have any issues. Both are still running right now, several hours after the leap second. They're Synology boxes running their own version of Linux (DSM 3.1-1636 and DSM 4.0-2196). FWIW, the box running DSM 3.1 has never had a problem with leap seconds, and has endured several since we've had it running almost continuously[*] since 2007. Our desktop systems were not running because everyone was in bed, but those that have been used this morning were fine (Xubuntu 12.04, both the i386 and amd64 flavors).
[*] Since late 2007, it has been rebooted a few times for updates to the DSM system, once for upgrading its internal disks, and has been taken down several times when the length of a power outage exceeded 10 minutes, as our pathetic UPS will only keep the servers running for about 30 minutes. We're in a rural area, so the power is quite dodgy, especially in summer thunderstorms.
Those who can make you believe absurdities can make you commit atrocities. - Voltaire
Some of us work with low budget, crap management, and tight deadlines. Sometimes you just give the code and don't give a shit. Unfortunately.
And do remember bitchecker who can not possibly have ping timeouts because he even has dst!
After setting up NTP on Linux systems, I forget abou it. Completely. They are all working fine.
OTOH, I check the time on a Win7 TV recording box weekly, just to ensure it hasn't drifted 20 minutes. I'm serious - it used to drift over an hour. I ended up setting up an internal NTP server (on Linux), then forcing Win7, Vista, XP machines to sync with it every 15 minutes. I tried 1 hour and it was still drifting 5+ minutes. I've seen this problem with Windows since 1997 and it hasn't gotten better. The exact same PCs deal with time perfectly when running Linux. Odd?
Back when I was in corporate IT with about 5 onsite Microsoft engineers assigned to the company (we had over 120K desktops), they told me that desktop clocks within 5 minutes was acceptable. I'm 100% serious. I could tell when the PC was off - the time on the phone would be correct, set by a UNIX NTP box "somewhere", but I'd be late to meetings because my PC didn't "ping!!!!" a reminder until 5 minutes late.
Anyway, last night I was recording a longish movie on the Win7 box. It appears to have been confused - recorded 10 minutes of Sienfeld and missed the last 10 minutes of the movie. This morning, I checked that system's clock - it was fine.
well, its in response for the trolling linsux guys do against windows since the beginning. now you get it!
...And some are not. Note to self: Do not take a holiday during a leap second!
I had 2 Debian Squeeze Blade servers in Thailand kernel panic on me at 3am (AEST). What strikes me as odd as out of the 6 blades that we have Debian running on (all running squeeze and kernel 2.6.32 with identical packages) only 2 of them had a Panic, and so much for the advisory saying it only affects kernel 2.6.29. There might be more to it than the kernel but sheesh, I'm on holiday!
I am stunned that there are coders out there who write code like i==10. Didn't you get taught in your studies to always think about boundary conditions and write i>9 EVERY TIME?
Watch out we got a bad ass over here... There are situations that i==10 is necessary. Wow you can't think of any?
Windows: 95. Scene: LAN party. Game: Descent.
Off-topic, but I wish I had had more friends with computers back when Descent wasn't ancient. Great game that I would've loved to play multiplayer on more often.
This was covered well on ServerFault yesterday as someone noticed sudden system instability across their servers.
Edmund White
http://flickr.com/ewwhite
I really enjoyed getting to sleep in for that extra second this morning :)
and possibly other airlines http://www.news.com.au/travel/news/leap-second-crashes-qantas-and-leaves-passengers-stranded/story-e6frfq80-1226413961235
As an online discussion grows longer, the probability of a reference to Godwin's Law approaches 1
We have a problem with all Novell SLES 11 SP1 servers (both patched and unpatched) that run Java
We experienced very high load from exactly 30 minutes before UTC midnight 30/06/2012. Only virtual servers with a low vCPU core count seem to be affected by the bug ie Load Average of up to 50-100. We only have one physical server with 8 cores and it's still behaving. Symptoms seem to be one java process running at ~75% and ksoftirqd at about 25% is enough to peg the CPU load above 50.
As someone above suggested, setting the date by executing the date command to set the time to the current time seems to resolve the issue and things return to normal:
date `date +"%m%d%H%M%C%y.%S"`
I found stopping/starting ntp daemon has no effect on the efficacy of the work-around.
and your code for "if seconds > 58" would run twice....
:-( ... the ... uh, bright yellow ball"
BOOM
JUST STOP LEAP SECONDS astronomers can adjust 1/60 of an arc minute every year or so.
The reset of us don't care where the sun is at noon. And if you live in the UK, you don't care where the sun is at noon, you just say "Oooo, look above at the
Wayne
Was descent peee-to-peer or hosted? Or was it one of the few games back then that could hanle a crashed client in peer-to-peer mode?
But very funny nonetheless.
The backup generator for my building uses an embedded controller. One of it's functions is to initiate a self test which starts and excercises the engine on the generator for a few minutes once a week.
Yesterday, it went into a loop of starting itself up for one minute and then shutting down. It did this continuously for one hour. I strongly suspect the leap second was the cause.