Slashdot Mirror


The Leap Second Is Here! Are Your Systems Ready?

Tmack writes "The last time we had a leap second, sysadmins were taken a bit by surprise when a random smattering of systems locked up (including Slashdot itself) due to a kernel bug causing a race condition specific to the way leap seconds are handled/notified by ntp. The vulnerable kernel versions (prior to 2.6.29) are still common amongst older versions of popular distributions (Debian Lenny, RHEL/CentOS 5) and embedded/black-box style appliances (Switches, load balancers, spam filters/email gateways, NAS devices, etc). Several vendors have released patches and bulletins about the possibility of a repeat of last time. Are you/your team/company ready? Are you upgraded, or are you going to bypass this by simply turning off NTP for the weekend?" Update: 07/01 03:14 GMT by S : ZeroPaid reports that this issue took down the Pirate Bay for a few hours.

20 of 284 comments (clear)

  1. Irony by bughunter · · Score: 5, Funny

    Leap years = no problem.

    Leap seconds = kernel panic.

    I fear for teh internets if we try a leap millisecond.

    --
    I can see the fnords!
    1. Re:Irony by at10u8 · · Score: 4, Interesting

      The time service bureaus used to insert leap milliseconds at almost any time. See the bottom plot at http://www.ucolick.org/~sla/leapsecs/amsci.html where there were 29 leaps in 3 years.

    2. Re:Irony by 0123456 · · Score: 4, Interesting

      I'm interested to hear what the excuse is - because it will probably sound a lot like the things you all flame Windows users for...

      A lot of Linux systems are on private networks and have to be up 24/7. Dealing with a known bug is considered less problematic than installing a new OS version and invalidating all the testing which has proven that the system can run 24/7 over the last few years.

      So I guess you're right, it is very similar to the reasons why there are many unpatched Windows systems out there.

    3. Re:Irony by pla · · Score: 4, Insightful

      I mean really - if this was fixed a year ago, what is the excuse for still running the old problematic version?

      The excuse? I have a file server and a router that run 24/7/365.24 (+1/86400, on occasion), and they just work. I have no interest in even logging into them, and they will remain "stock" systems until either a critical SSL vulnerability (in the case of the router) or I absolutely need a feature not possible with that old of a system. And when I say "old", I mean, talking "Slackware 4" here until about a year ago.

      One of the nice things about Linux - It just works. You don't get random reboots every two weeks when Microsoft decides you must install this particular update, It doesn't get "crufty" the same way the Windows registry does, it doesn't suddenly fail to boot one morning (though in fairness, the fact that we never shut them down probably leads to a bias in that regard). It just works, day after day, year after year. If it worked yesterday and no hardware failed overnight, it will work today.

      Now... If you want to call that something that we complain about in Windows... Hey, I'll admit it, I want my software to "just work". Whether that means a Linux server that never goes down, or an XP desktop environment that (for the 18-24 months between puking) everything supports, I just want my hammer to pound nails and my crowbar to pull them back out, and I don't care if my screwdriver believes in Buddha or Jesus or Xenu.

    4. Re:Irony by 0123456 · · Score: 4, Insightful

      And what do you do when the kernel change causes your system to start crashing, when it had previously operated for years with no failures?

      An acquaintance supports a system which has been in operation for years and breaks every time there's a leap second (not because of the Linux kernel but other software and hardware issues). That means every few years he spends a couple of hours rebooting the servers and verifying that it's up and running again afterwards. Fixing the software would mean a substantial amount of development work followed by weeks of testing.

  2. Re:How is this an issue? by swalve · · Score: 4, Informative

    When NTP tries to say that it is 12:34:61 and the computer only expects 1-60.

  3. More important than kernel issues by Anonymous Coward · · Score: 4, Funny

    For those wondering whether they get one more second of sleep tonight or one less, the rule is 'spring forwards, fall back, summer stand there looking confused'.

  4. Re:How is this an issue? by mgscheue · · Score: 5, Informative

    It actually goes 23h 59m 59s, 23h 59m 60s, 00h 00m 00s. See http://www.nist.gov/pml/div688/leapseconds.cfm

  5. Re:How is this an issue? by msauve · · Score: 4, Insightful

    Poorly written software only expects seconds to go from 0-59. Positive leap seconds are counted 23:59:59 -> 23:59:60 -> 0:0:0. Leap seconds have been around since 1972, the same year Unix was rewritten in C. There's been plenty of time to get things right.

    --
    "National Security is the chief cause of national insecurity." - Celine's First Law
  6. Re:Haha by 93+Escort+Wagon · · Score: 4, Funny

    Enjoy your free operating system that was stopped by an extra second.

    Yes, because we've NEVER seen Windows have problems dealing with things like Daylight Savings...

    --
    #DeleteChrome
  7. as of about a year ago, I started defensive coding by GoodNewsJimDotCom · · Score: 4, Interesting

    Hello, Some of us code our systems somewhat like a finite state machine, and we figure our machine will never operate outside it.

    If you're testing if something that increments ever hits a number(like 10) and goes back to 0, instead of checking if it ==10, check if it is >9.
    There are a lot of defensive coding mechanisms you can use. The downside of this is that when you debug, something can sneak by and put you outside of a state you want, so it makes it ever so slightly harder to debug. But if you're making software that will be used by the public that is hard to give updates, defensive programming can save the day here and there. ,Jim

  8. Re:as of about a year ago, I started defensive cod by JustNiz · · Score: 4, Funny

    Our servers run on octal, you insensitive clod.

  9. Re:Haha by EdIII · · Score: 4, Insightful

    Yeah it had the wrong time but did not freeze up. What's your excuse?

    You're really trying that hard to troll huh?

    A free operating system has a bug in it so you want to exaggerate the existence of the bug to show that free operating systems are inferior in such a condescending and acerbic way.

    I guess that can work. It's not like there is any paid OS out there that has decades long histories of serious instability, security flaws, and badly implemented ideas...... so yeah, you're completely safe making such an arrogant argument.

  10. Re:Haha by drinkypoo · · Score: 4, Funny

    Windows: 95. Scene: LAN party. Game: Descent. Hilarity: All the Windows users cursing loudly as their computers spontaneously reboot for DST. DOS users get to feel smug for a change.

    Windows has been boning DST as long as Windows has handled your RTC.

    --
    "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
  11. Leap second got Reddit? by daff2k · · Score: 4, Interesting

    Looks like Reddit's systems weren't ready for the leap second. It been down since around midnight (UTC). You'd think a site as big as that would be ready for such an event.

    --
    And which parallel universe did you crawl out of?
    1. Re:Leap second got Reddit? by painandgreed · · Score: 5, Funny

      Looks like Reddit's systems weren't ready for the leap second. It been down since around midnight (UTC). You'd think a site as big as that would be ready for such an event.

      Have you tried truing it off and turning it on again?

  12. Yes! by antdude · · Score: 5, Informative

    https://twitter.com/redditstatus/status/219244389044731904 just said so -- "We are having some Java/Cassandra issues related to the leap second at 5pm PST. We're working as quickly as we can to restore service." :D

    --
    Ant(Dude) @ Quality Foraged Links (AQFL.net) & The Ant Farm (antfarm.ma.cx / antfarm.home.dhs.org).
  13. Having issues with java/systems? try this by mwhahaha · · Score: 4, Informative

    /etc/init.d/ntpd stop; date; date `date +"%m%d%H%M%C%y.%S"`; date;

    Fixed the issues I was having. Credit goes to https://twitter.com/SilvioSantoZ/status/219250677522767872. I didn't have to restart anything after running it. YMMV

  14. Re:How is this an issue? by keeboo · · Score: 4, Funny

    What the heck is the opposite of a leap second?
    A leap anti-second?

  15. Re:How is this an issue? by msauve · · Score: 5, Informative

    A day is one Earth revolution, relative to Sol. It varies slightly because of a number of factors, and is called UT1. UT2R is a smoothed version, and but variations due to unpredictable events are left. UTC is based on the atomic second. The value chosen for the atomic second is such that, on average, there have been slightly more than 86400 of them in a day. So, just as a year is more than 365 days (a day is slightly shorter than 1/365 year), so an occasional leap day needs to be added, so to an occasional leap second is needed.

    Contrary to what the GP said, the solar day is not too fast. It is what it is, by definition. Rather, the second is a bit too short.

    On average, since the leap second was introduced in 1972, one has been needed about every 18 months. Over the long term, that rate will increase as tidal acceleration slows the earth. 1 sec/18 months ~= 2e-8, so that's how much the second has been off on average since 1972. The atomic value for the second is 9,192,631,770 periods of the radiation corresponding to the transition between the two hyperfine levels of the ground state of the caesium 133 atom. So, a better value might have been 9,192,631,967, which would make us about even to date. (Although, since leap seconds aren't distributed evenly, they would still have occurred, both positive and negative, just not as many.) The original value was based on measurements made over less than 3 years, and has worked for some shorter periods (there were no leap seconds between 1999 and 2004, for example), but the value chosen has proven to be too short over the 40 years of leap seconds.

    --
    "National Security is the chief cause of national insecurity." - Celine's First Law