Slashdot Mirror


The Leap Second Is Here! Are Your Systems Ready?

Tmack writes "The last time we had a leap second, sysadmins were taken a bit by surprise when a random smattering of systems locked up (including Slashdot itself) due to a kernel bug causing a race condition specific to the way leap seconds are handled/notified by ntp. The vulnerable kernel versions (prior to 2.6.29) are still common amongst older versions of popular distributions (Debian Lenny, RHEL/CentOS 5) and embedded/black-box style appliances (Switches, load balancers, spam filters/email gateways, NAS devices, etc). Several vendors have released patches and bulletins about the possibility of a repeat of last time. Are you/your team/company ready? Are you upgraded, or are you going to bypass this by simply turning off NTP for the weekend?" Update: 07/01 03:14 GMT by S : ZeroPaid reports that this issue took down the Pirate Bay for a few hours.

53 of 284 comments (clear)

  1. Irony by bughunter · · Score: 5, Funny

    Leap years = no problem.

    Leap seconds = kernel panic.

    I fear for teh internets if we try a leap millisecond.

    --
    I can see the fnords!
    1. Re:Irony by at10u8 · · Score: 4, Interesting

      The time service bureaus used to insert leap milliseconds at almost any time. See the bottom plot at http://www.ucolick.org/~sla/leapsecs/amsci.html where there were 29 leaps in 3 years.

    2. Re:Irony by ericloewe · · Score: 2

      It was not the largest of timer overflows that killed them, but the tiniest leap second...

    3. Re:Irony by 0123456 · · Score: 4, Interesting

      I'm interested to hear what the excuse is - because it will probably sound a lot like the things you all flame Windows users for...

      A lot of Linux systems are on private networks and have to be up 24/7. Dealing with a known bug is considered less problematic than installing a new OS version and invalidating all the testing which has proven that the system can run 24/7 over the last few years.

      So I guess you're right, it is very similar to the reasons why there are many unpatched Windows systems out there.

    4. Re:Irony by fuzzyfuzzyfungus · · Score: 2

      Jokes aside, the issue is that leap seconds are not deterministic.

      TAI, based on whatever flavor of atomic clock is currently the going standard, is markedly more regular than the observed time based on the movement of the earth(UT1). UTC ticks at the same rate as TAI; but is supposed to correspond to UT1, which ticks at an unpredictable rate. So, whenever UT1 and UTC drift too far apart(DUT1 approaches .9 seconds), UTC gets a leap second. The UTC tick rate is constant; but the leap-second days are 1 second longer.

      Obviously, we should just abandon this UTC nonsense.

    5. Re:Irony by pla · · Score: 4, Insightful

      I mean really - if this was fixed a year ago, what is the excuse for still running the old problematic version?

      The excuse? I have a file server and a router that run 24/7/365.24 (+1/86400, on occasion), and they just work. I have no interest in even logging into them, and they will remain "stock" systems until either a critical SSL vulnerability (in the case of the router) or I absolutely need a feature not possible with that old of a system. And when I say "old", I mean, talking "Slackware 4" here until about a year ago.

      One of the nice things about Linux - It just works. You don't get random reboots every two weeks when Microsoft decides you must install this particular update, It doesn't get "crufty" the same way the Windows registry does, it doesn't suddenly fail to boot one morning (though in fairness, the fact that we never shut them down probably leads to a bias in that regard). It just works, day after day, year after year. If it worked yesterday and no hardware failed overnight, it will work today.

      Now... If you want to call that something that we complain about in Windows... Hey, I'll admit it, I want my software to "just work". Whether that means a Linux server that never goes down, or an XP desktop environment that (for the 18-24 months between puking) everything supports, I just want my hammer to pound nails and my crowbar to pull them back out, and I don't care if my screwdriver believes in Buddha or Jesus or Xenu.

    6. Re:Irony by binarylarry · · Score: 2

      Ziggy must be on the sauce again.

      --
      Mod me down, my New Earth Global Warmingist friends!
    7. Re:Irony by 0123456 · · Score: 4, Insightful

      And what do you do when the kernel change causes your system to start crashing, when it had previously operated for years with no failures?

      An acquaintance supports a system which has been in operation for years and breaks every time there's a leap second (not because of the Linux kernel but other software and hardware issues). That means every few years he spends a couple of hours rebooting the servers and verifying that it's up and running again afterwards. Fixing the software would mean a substantial amount of development work followed by weeks of testing.

    8. Re:Irony by techno-vampire · · Score: 2

      And what do you do when the kernel change causes your system to start crashing, when it had previously operated for years with no failures?

      I'd really, really love to respond by saying that That Just Doesn't Happen. Alas, I know better. My laptop is currently running Fedora 16 with a PAE kernel from Fedora 14 because every 3.x kernel I've tried on it hangs during boot while trying to do something with my card reader. And, if I have a card in the reader, it ends up rebooting itself. About all I can say is that the situation you describe is (mercifully) rare, and if you do find yourself stuck with that, about all you can do is turn off ntpd before the leap second, and turn it back on later.

      --
      Good, inexpensive web hosting
    9. Re:Irony by grcumb · · Score: 3, Insightful

      And what do you do when the kernel change causes your system to start crashing, when it had previously operated for years with no failures?

      Er, you restart with the older kernel, which is right there on the grub boot menu.

      ... Sorry, was this a trick question or something...?

      --
      Crumb's Corollary: Never bring a knife to a bun fight.
    10. Re:Irony by Hognoxious · · Score: 3, Interesting

      Microsoft doesn't force anyone to update. You are in control of your updates in Windows, just like in Linux.

      Nope. I have windows update set to "check but ask" and occasionally I find that it's restarted due to updates without even informing me.

      --
      Confucius say, "Find worm in apple - bad. Find half a worm - worse."
  2. Re:How is this an issue? by swalve · · Score: 4, Informative

    When NTP tries to say that it is 12:34:61 and the computer only expects 1-60.

  3. More important than kernel issues by Anonymous Coward · · Score: 4, Funny

    For those wondering whether they get one more second of sleep tonight or one less, the rule is 'spring forwards, fall back, summer stand there looking confused'.

  4. Re:How is this an issue? by mgscheue · · Score: 5, Informative

    It actually goes 23h 59m 59s, 23h 59m 60s, 00h 00m 00s. See http://www.nist.gov/pml/div688/leapseconds.cfm

  5. Re:How is this an issue? by msauve · · Score: 4, Insightful

    Poorly written software only expects seconds to go from 0-59. Positive leap seconds are counted 23:59:59 -> 23:59:60 -> 0:0:0. Leap seconds have been around since 1972, the same year Unix was rewritten in C. There's been plenty of time to get things right.

    --
    "National Security is the chief cause of national insecurity." - Celine's First Law
  6. what about the metric time system? by Joe_Dragon · · Score: 2

    what about the metric time system?

  7. Re:Haha by 93+Escort+Wagon · · Score: 4, Funny

    Enjoy your free operating system that was stopped by an extra second.

    Yes, because we've NEVER seen Windows have problems dealing with things like Daylight Savings...

    --
    #DeleteChrome
  8. Re:The leap second is done horribly wrong by at10u8 · · Score: 2

    Perchance something like this example worked with existing deployed and tested code? http://www.ucolick.org/~sla/leapsecs/right+gps.html

  9. as of about a year ago, I started defensive coding by GoodNewsJimDotCom · · Score: 4, Interesting

    Hello, Some of us code our systems somewhat like a finite state machine, and we figure our machine will never operate outside it.

    If you're testing if something that increments ever hits a number(like 10) and goes back to 0, instead of checking if it ==10, check if it is >9.
    There are a lot of defensive coding mechanisms you can use. The downside of this is that when you debug, something can sneak by and put you outside of a state you want, so it makes it ever so slightly harder to debug. But if you're making software that will be used by the public that is hard to give updates, defensive programming can save the day here and there. ,Jim

  10. Re:How is this an issue? by bunratty · · Score: 3, Informative

    Because that would be the opposite of a leap second maybe?

    --
    What a fool believes, he sees, no wise man has the power to reason away.
  11. Re:as of about a year ago, I started defensive cod by JustNiz · · Score: 4, Funny

    Our servers run on octal, you insensitive clod.

  12. Re:Haha by EdIII · · Score: 4, Insightful

    Yeah it had the wrong time but did not freeze up. What's your excuse?

    You're really trying that hard to troll huh?

    A free operating system has a bug in it so you want to exaggerate the existence of the bug to show that free operating systems are inferior in such a condescending and acerbic way.

    I guess that can work. It's not like there is any paid OS out there that has decades long histories of serious instability, security flaws, and badly implemented ideas...... so yeah, you're completely safe making such an arrogant argument.

  13. Re:Haha by drinkypoo · · Score: 4, Funny

    Windows: 95. Scene: LAN party. Game: Descent. Hilarity: All the Windows users cursing loudly as their computers spontaneously reboot for DST. DOS users get to feel smug for a change.

    Windows has been boning DST as long as Windows has handled your RTC.

    --
    "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
  14. out of a state you want by QuasiSteve · · Score: 2

    The downside of this is that when you debug, something can sneak by and put you outside of a state you want

    Oh I do hate when that happens at work. We end up so bewildered. "Are we dead? Or is this Ohio?"

    ( cookies for whoever gets the reference without Googling. )

  15. Re:Haha by alexborges · · Score: 2, Informative

    Windows Azure is DOWN AS WE SPEAK: http://blogs.msdn.com/b/windowsazure/archive/2012/03/01/windows-azure-service-disruption-update.aspx ... congrats on paying for your non working OS without any indemnity either.

    --
    NO SIG
  16. Re:Leap seconds are an idiotic idea by msauve · · Score: 2

    UTC is defined to be linked to Sol. It is used for things which depend on that characteristic (like astronomy and celestial navigation). If civil time doesn't need to be linked that closely, then it doesn't need to use UTC.

    --
    "National Security is the chief cause of national insecurity." - Celine's First Law
  17. Re:Haha by msauve · · Score: 2

    "Windows Azure is DOWN AS WE SPEAK"

    What OS are you running, which thinks it's February 29, AS WE SPEAK?

    --
    "National Security is the chief cause of national insecurity." - Celine's First Law
  18. Leap second got Reddit? by daff2k · · Score: 4, Interesting

    Looks like Reddit's systems weren't ready for the leap second. It been down since around midnight (UTC). You'd think a site as big as that would be ready for such an event.

    --
    And which parallel universe did you crawl out of?
    1. Re:Leap second got Reddit? by smasch · · Score: 3, Informative

      Yup, Reddit got nailed.
      Here's the tweet on @redditstatus. According to them "We are having some Java/Cassandra issues related to the leap second at 5pm PST."

    2. Re:Leap second got Reddit? by painandgreed · · Score: 5, Funny

      Looks like Reddit's systems weren't ready for the leap second. It been down since around midnight (UTC). You'd think a site as big as that would be ready for such an event.

      Have you tried truing it off and turning it on again?

  19. Re:Goddamned Java by Anonymous Coward · · Score: 3, Funny

    All my Java processes peg the CPU since the leap second, even if I restart them. Maybe a reboot will help...

    So just like before, then?

  20. Re:Goddamned Java by Anonymous Coward · · Score: 3, Informative

    Normally java is just in "waste memory" mode. Now it's "waste memory AND CPU".

  21. Re:How is this an issue? by mcavic · · Score: 3, Insightful

    Why would a Unix application ever see the :60? Any time someone checks the clock, the time should be derived from Unix time (seconds since the epoch) which doesn't account for leap seconds. So to an application it should appear as a duplicate :00 or :59.

  22. Terminology? by Anonymous Coward · · Score: 2, Interesting

    If 2012 is a leap year, doesn't that make 2012-06-30 23:59 a leap minute?

  23. Yes! by antdude · · Score: 5, Informative

    https://twitter.com/redditstatus/status/219244389044731904 just said so -- "We are having some Java/Cassandra issues related to the leap second at 5pm PST. We're working as quickly as we can to restore service." :D

    --
    Ant(Dude) @ Quality Foraged Links (AQFL.net) & The Ant Farm (antfarm.ma.cx / antfarm.home.dhs.org).
  24. Re:How is this an issue? by guruevi · · Score: 2

    The problem is not necessary in the time representation, the problem is that when NTP tries to insert the extra second into the kernel, the kernel gets stuck in a spinlock (basically waiting until a lock becomes available which never does).

    The thing is that NTP announces this adjustment to the kernel somewhere the day before so it doesn't necessarily happen at 23:59:59 GMT (although it could happen at that time too)

    --
    Custom electronics and digital signage for your business: www.evcircuits.com
  25. Re:How is this an issue? by certsoft · · Score: 3, Informative

    Unfortunately some GPS vendors don't get it right. I was testing conformance of a IT530 from FastRaxGPS which uses the Mediatek MT3339 receiver.

    It put out the sequence 23:59:59, 23:59:59, 00:00:00 repeating second 59 instead of using second 60.

  26. GPS Time by jasnw · · Score: 2

    This is also an issue for software that works with GPS data and time. The GPS clocks do not "speeka da leapsecond" so the software needs to keep track of things. There was a 15 second offset, and now it's 16 seconds. This has happened often enough that most areas where this might have been a problem have been discovered, but as slashdotters know, there's new code written every second (even leap seconds), and it ain't all finest kind.

  27. Having issues with java/systems? try this by mwhahaha · · Score: 4, Informative

    /etc/init.d/ntpd stop; date; date `date +"%m%d%H%M%C%y.%S"`; date;

    Fixed the issues I was having. Credit goes to https://twitter.com/SilvioSantoZ/status/219250677522767872. I didn't have to restart anything after running it. YMMV

  28. Re:How is this an issue? by __aaltlg1547 · · Score: 3, Informative

    When NTP tries to say that it is 12:34:61 and the computer only expects 1-60.

    That will never happen.

    Leap seconds are always asserted at UTC midnight on the last day of a month. I think the convention is only to have leap second opportunities at the end of March, June, September and December. Typically, they try to assert it at midnight December 31. It's unusual to have a mid-year leap-second.

    Since the normal progression is 23:59:58, 23:59:59, 00:00:00, the extra second makes the time 23:59:60. 61 would be TWO leap seconds which won't happen any time soon. The Earth's rate of rotation would have to change by nearly two seconds in 3 months.

  29. Re:How is this an issue? by batrick · · Score: 2

    $ man date
    [...]
                                %S second (00..60)
    [...]

    Oh, maybe when you use that (or strftime).

  30. Alleged issue did not appear... by Urban+Garlic · · Score: 2

    So the comments are confusing to me as to whether Debian "squeeze" is supposed to have a problem or not, but I have about fifty of these systems running, and as far as I can tell, they're all fine.

    I got a whole bunch of these in the logs:
    > Jun 30 19:59:59 kernel: [timestamp] Clock: inserting leap second 23:59:60 UTC

    I have three of the machines configured as NTP peers to each other, and looking at a few tier-1 time servers. The rest of the machines all use the three local peers as time servers.

    My Debian desktop systems at home also seem to be fine.

    --
    2*3*3*3*3*11*251
    1. Re:Alleged issue did not appear... by Fryth · · Score: 2

      I manage a network of a few hundred or so CentOS 4 and 5 as well as Debian Etch and Lenny machines. I'm on call this weekend and my phone is quiet. Seems like this issue really didn't impact everyone. This looks to be a non-issue for me, anyway. I see this too:

      Jun 30 19:59:59 kernel: [timestamp] Clock: inserting leap second 23:59:60 UTC

  31. What it sounds like on WWV by spaceyhackerlady · · Score: 2

    I listened to the leap second on WWV. It sounded like this:

    tick (23:59:55)
    tick (23:59:56)
    tick (23:59:57)
    tick (23:59:58)
    (nothing) (23:59:59)
    (nothing) (23:59:60)
    BEEP (00:00:00)

    It always sounds to me like WWV has gotten stuck or something.

    ...laura

  32. Re:Android by at10u8 · · Score: 3, Interesting

    anything that runs its kernel on GPS time can give correct UTC time by following this prescription http://www.ucolick.org/~sla/leapsecs/right+gps.html

  33. Re:How is this an issue? by hyperfine+transition · · Score: 2

    I was doing leap second testing in the last month and I'm pretty sure that date
    returns
    23:59:58
    23:59:59
    23:59:59
    00:00:00
    as you go through the leap second addition
    (Un)fortunately, not at work so I can't double check but a quick look at the date source code suggests that this is indeed
    its behaviour on Linux.

  34. Re:How is this an issue? by keeboo · · Score: 4, Funny

    What the heck is the opposite of a leap second?
    A leap anti-second?

  35. Re:How is this an issue? by msauve · · Score: 5, Informative

    A day is one Earth revolution, relative to Sol. It varies slightly because of a number of factors, and is called UT1. UT2R is a smoothed version, and but variations due to unpredictable events are left. UTC is based on the atomic second. The value chosen for the atomic second is such that, on average, there have been slightly more than 86400 of them in a day. So, just as a year is more than 365 days (a day is slightly shorter than 1/365 year), so an occasional leap day needs to be added, so to an occasional leap second is needed.

    Contrary to what the GP said, the solar day is not too fast. It is what it is, by definition. Rather, the second is a bit too short.

    On average, since the leap second was introduced in 1972, one has been needed about every 18 months. Over the long term, that rate will increase as tidal acceleration slows the earth. 1 sec/18 months ~= 2e-8, so that's how much the second has been off on average since 1972. The atomic value for the second is 9,192,631,770 periods of the radiation corresponding to the transition between the two hyperfine levels of the ground state of the caesium 133 atom. So, a better value might have been 9,192,631,967, which would make us about even to date. (Although, since leap seconds aren't distributed evenly, they would still have occurred, both positive and negative, just not as many.) The original value was based on measurements made over less than 3 years, and has worked for some shorter periods (there were no leap seconds between 1999 and 2004, for example), but the value chosen has proven to be too short over the 40 years of leap seconds.

    --
    "National Security is the chief cause of national insecurity." - Celine's First Law
  36. Re:The leap second is done horribly wrong by sjames · · Score: 2

    It is perfectly valid for two back to back calls to gettimeofday to return the same value. It can happen at any time if the calls are closer together than the granularity of the time.

    It is unusual, but perfectly possible for the second result to be less than the first if the clock has been reset. It is bad form for a program to panic over that. However, to try to avoid problems and to make logs a LOT less confusing, NTP prefers to slew the time rather than make hard adjustment. That is, it speeds up or slows down the system clock a bit so that system time and the reference time will converge.

  37. WTF is the issue? by AliasMarlowe · · Score: 2

    Perhaps this is just affecting some kernel versions or specific applications which behave poorly.

    One data point: both of my servers were running all night, with NTP updates, and did not appear to have any issues. Both are still running right now, several hours after the leap second. They're Synology boxes running their own version of Linux (DSM 3.1-1636 and DSM 4.0-2196). FWIW, the box running DSM 3.1 has never had a problem with leap seconds, and has endured several since we've had it running almost continuously[*] since 2007. Our desktop systems were not running because everyone was in bed, but those that have been used this morning were fine (Xubuntu 12.04, both the i386 and amd64 flavors).

    [*] Since late 2007, it has been rebooted a few times for updates to the DSM system, once for upgrading its internal disks, and has been taken down several times when the length of a power outage exceeded 10 minutes, as our pathetic UPS will only keep the servers running for about 30 minutes. We're in a rural area, so the power is quite dodgy, especially in summer thunderstorms.

    --
    Those who can make you believe absurdities can make you commit atrocities. - Voltaire
    1. Re:WTF is the issue? by TheRaven64 · · Score: 3, Insightful

      The issue is that a lot of software was written on the assumption that there are 60 seconds in a minute. If something happens at the 61st second of a minute, stuff gets confused. Either it rejects the event, or incorrectly marks the time. Leap seconds are an incredibly expensive idiocy designed to make a few astronomers happy.

      --
      I am TheRaven on Soylent News
  38. Some folks are lucky... by jampola · · Score: 2

    ...And some are not. Note to self: Do not take a holiday during a leap second!

    I had 2 Debian Squeeze Blade servers in Thailand kernel panic on me at 3am (AEST). What strikes me as odd as out of the 6 blades that we have Debian running on (all running squeeze and kernel 2.6.32 with identical packages) only 2 of them had a Panic, and so much for the advisory saying it only affects kernel 2.6.29. There might be more to it than the kernel but sheesh, I'm on holiday!

  39. Re:How is this an issue? by NewYork · · Score: 2

    Sleep second?