Slashdot Mirror


June 30th Leap Second Could Trigger Unexpected Issues

dkatana writes: On January 31, 2013, approximately 400 milliseconds before the official release of the EIA Natural Gas Report, trading activity exploded in Natural Gas Futures. It is believed that was the result of some fast computer trading systems being programmed to act, and have a one-second advance access to the report. On June 30th a leap second will be added to the Network Time Protocol (NTP) to keep it synchronized with the slowly lengthening solar day. In this article, Charles Babcock gives a detailed account of the issues, and some disturbing possibilities: The last time a second needed to be added to the day was on June 30, 2012. For Qantas Airlines in Australia, it was a memorable event. Its systems, including flight reservations, went down for two hours as internal system clocks fell out of synch with external clocks.

The original author of the NTP protocol, Prof. David Mills at the University of Delaware, set a direct and simple way to add the second: Count the last second of June 30 twice, using a special notation on the second count for the record. Google will use a different approach: Over a 20-hour period on June 30, Google will add a couple of milliseconds to each of its NTP servers' updates. By the end of the day, a full second has been added. As the NTP protocol and Google timekeepers enter the first second of July, their methods may differ, but they both agree on the time.

But that could also be problematic. In adding a second to its NTP servers in 2005, Google ran into timekeeping problems on some of its widely distributed systems. The Mills sleight-of-hand was confusing to some of its clusters, as they fell out of synch with NTP time. Does Google's smear approach make more sense to you, or does Mills's idea of counting the last second twice work better? Do you have a better idea of how to handle this?

48 of 233 comments (clear)

  1. Doesn't matter by StormShaman · · Score: 5, Informative

    The only problem mentioned is that they fall out of sync with each other. If they're both otherwise fine, just pick one. Sounds like the disadvantages of either one aren't as big as the disadvantage of them not working well together.

  2. Google is right by phishybongwaters · · Score: 5, Interesting

    Typically when dealing with NTP you do not want big swings. In fact, a system using NTP that's too far out of sync, won't sync back up correctly. One that is slightly out of sync will slowly come back in sync over a period of time, hours or days even. Both approaches could work, they really could, but I think adding a few milliseconds here and there is a better way to get this done as long as the systems don't fall too far behind. I work with Avaya voice equipment and we've been warning people about this for months and months. We've provided instructions on several methods to ensure this doesn't cripple your system, but it all depends on how your NTP is setup. I also foresee issues with just adding an extra second to the day, this is not going to work for a bunch of systems and will actually throw them out of sync compared to googles approach. One of the solutions we've "provided" is to disable NTP shortly before the time roll over, then enable it once it's July. That's a pain in the butt, but if you can afford the few minutes of service interruption, it solves all of the issues right there, you turn it off when it's synced, turn it back on and it syncs to the new time. The real issues come in, for my field at least, with logging, this is going to throw a wrench into sys logs if it's not taken care of, and with some of the platforms, it will literally cripple the system.

    1. Re:Google is right by Penguinisto · · Score: 2

      Typically when dealing with NTP you do not want big swings.

      This is a solved problem, though (sibling points out the reason why: slew.) In practice, this is also a known conditions, especially with virtual machines (doubly so with VMWare-hosted VMs). This is because VM's time-slice the physical CPU, so the keeping time on the VM's OS clock is very imperfect anyway.

      --
      Quo usque tandem abutere, Nimbus, patientia nostra?
  3. Re:Buggy software is buggy by pla · · Score: 2

    Also, most if not all languages have libraries that can handle accurate timing very well.

    I would consider t-SQL and *.NET pretty major languages, that completely fail to handle leap seconds.

  4. Re:Buggy software is buggy by CaptainJeff · · Score: 2

    But...it's not.
    Because you have different approaches to it. If the community could agree on how to address the (growing) difference in time as measured by Earthborn measures with solar/Earth/rotation measures, then it would be. But, there are legitimate and valid disagreements with how time should be kept.

  5. choose what standard to violate by at10u8 · · Score: 4, Informative

    A problem for sysadmins is that the status quo of the standards requires that we choose which standard we want to violate. We can violate the specification of UTC by not counting 23:59:60 or we can violate POSIX by counting it or we can violate POSIX and the SI second by not actually keeping the system clock on UTC using smeared seconds that are not suitable for tracking projectiles and other real-time applications. This problem is old, 50 years old, as seen in the 3 plots on this web page.

    1. Re:choose what standard to violate by mbone · · Score: 3, Insightful

      If the POSIX standards people had bothered to actually follow the existing SI and ITU standards back in 1988 when they were setting up their standard, this would not be an issue.

    2. Re:choose what standard to violate by suutar · · Score: 2

      one of the links from that page talks about how using custom timezone files you can use non-leap seconds and still translate to accurate real-world values. I'm not terribly familiar with time keeping protocols; installing ntp and pointing it at a server is about as far as I can manage. Do you see a problem with the approach laid out at "Correct precision handling of leap seconds using code already on POSIX systems "?

  6. Dice: Please restore the Read More link. Thanks. by Anonymous Coward · · Score: 5, Insightful

    I understand the desire to change things, but putting some social media Share link in place of the Read More link goes against the kind of website Slashdot is.

    Please restore the original layout. Thanks.

  7. their approach is called: SLEW by Jizzbug · · Score: 2

    Their method has a name in NTP parlance, it is called slew.

    See man page ntpd(8).

    --

    -=/\- Jizzbug -/\=-
  8. Sync by Espectr0 · · Score: 3, Informative

    We have 600 machines in my company's network distributed over 20 cities in our country. The servers are all located on our main branch and are connected through slow WAN frame relay links (up to 4Mbps)

    We have time differences between machines, sometimes up to 3 or 4 minutes, and we don't seem to have issues. I find it strange than a possible 1 second different could cause so much issues.

    Perhaps the Google method is better because the adjustment will take place during the day and not at the last second.

    1. Re:Sync by 0123456 · · Score: 4, Informative

      I find it strange than a possible 1 second different could cause so much issues.

      It's not the time difference that causes problems per se, it's time going backwards. You presumably missed the fact that many Java servers crashed over the last leap second because of a kernel bug that screwed up their internal timers?

      We had problems last time due to faults reported by external hardware when it saw the time jump backwards. I'll be at my desk when it happens this time to deal with any problems that come up this time.

      And, given the chaos every leap second causes, hopefully we can finally convince the 'experts' to stop fiddling with time.

    2. Re:Sync by DigiShaman · · Score: 2

      Windows Active Directory I presume? You have a slack of about 5 seconds between DC (Domain Controllers) and member machines. Otherwise, you might get the following error in the event logs.

      Event ID 50: The time service detected a time difference of greater than 5000 milliseconds for 900 seconds. The time difference might be caused by synchronization with low-accuracy time sources or by suboptimal network conditions. The time service is no longer synchronized and cannot provide the time to other clients or update the system clock. When a valid time stamp is received from a time service provider, the time service will correct itself.

      --
      Life is not for the lazy.
    3. Re:Sync by Espectr0 · · Score: 2

      We experience this issues when the motherboard battery dies and resets the computer's date to year 2000 or such. Since most users aren't admins, the machines can't receive the correct time on their accounts therefore we logon with our admin accounts and the time corrects itself.

      But for 3-4 minutes we don't have issues.

    4. Re:Sync by ceoyoyo · · Score: 2

      Slew can be used in NTP for any clock adjustment, not just leap seconds. Linux does use slew (as opposed to step) to make clock adjustments. In the special case of leap seconds, it uses step, rather than slew.

    5. Re:Sync by ceoyoyo · · Score: 4, Informative

      I'm not sure exactly what arguments each Linux distribution uses, but this is from the man page on ntpd:

      -x
      Normally, the time is slewed if the offset is less than the step threshold, which is 128 ms by default, and stepped if above the threshold. This option forces the time to be slewed in all cases. If the step threshold is set to zero, all offsets are stepped, regardless of value and regardless of the -x option. In general, this is not a good idea, as it bypasses the clock state machine which is designed to cope with large time and frequency errors Note: Since the slew rate is limited to 0.5 ms/s, each second of adjustment requires an amortization interval of 2000 s. Thus, an adjustment of many seconds can take hours or days to amortize. This option can be used with the -q option.

      My reading of that is that the normal adjustment uses slew. Step is used only when there's a big discrepancy, and you can use -x to use slew even in that case.

  9. Re:How Will The Naval Observatory Clock Handle Thi by ledow · · Score: 2

    That's not the problem.

    Leap seconds are inserted by pretending that there's a 61st second in a minute. Everything not designed to handle that will fall flat on its face.

    It's not a question of not knowing what time it is, it's a question of whether your software was built with certain (I would say not unreasonable at first glance) assumptions, or whether it follows the actual specification of the functions it uses and the data structures it handles.

    58, 59, 60, 0, 1 tends to blow a lot of stuff up that was never built to handle such instances.

  10. Re:Dice: Please restore the Read More link. Thanks by enigma32 · · Score: 4, Informative

    +1 - Mod parent up.

  11. Re:How Will The Naval Observatory Clock Handle Thi by 0123456 · · Score: 2

    Linux (at least the kernel we run) handles a leap second as 23:58, 23:59, 23:59, 00:00. Code that has to do something specific at 23:59 then ends up doing it twice, unless you detect that and deal with it.

  12. We need a long-term solution by ErikTheRed · · Score: 3, Funny

    even if it means re-defining the second or decoupling official time measurements from planetary movement. Leap days, leap seconds, etc., are silly hacks that belong in a bygone era.

    --

    Help save the critically endangered Blue Iguana
  13. just a second by frovingslosh · · Score: 5, Funny

    At least it is just a second. That sudden extra hour of daylight in the spring is really bad for my rose bushes.

    --
    I'm an American. I love this country and the freedoms that we used to have.
    1. Re:just a second by goombah99 · · Score: 2

      The cows hate it too.

      --
      Some drink at the fountain of knowledge. Others just gargle.
    2. Re:just a second by ceoyoyo · · Score: 2

      It's disturbing that you're modded informative.

  14. Massive stupidity by mbone · · Score: 2

    There is exactly one correct way to do this.

    2015-06-30T23:59:59
    2015-06-30T23:59:60
    2015-07-01T00:00:00

    David Mills approach is not correct, but will generally work and limits the pain to 1 second.

    Anything else is just stupid. We've only been doing this since 1972. You would think people would get with the program by now.

    1. Re:Massive stupidity by NoOneInParticular · · Score: 2

      There's another exactly one correct way to do it. Lengthen the nanosecond to be in tune with the Earth's revolution around the sun instead of counting periods of the radiation corresponding to the transition between the two hyperfine levels of the ground state of the cesium 133 atom.

  15. Re:We should do what GPS does by heypete · · Score: 2

    I recently took a private tour of the time and frequency lab at METAS (the Swiss Federal Institute of Metrology) and got to observe their atomic clocks, ask the people there some questions, etc.

    The scientist in charge of the lab wishes everyone would use TAI for time distribution. TAI has no leap seconds and differs from GPS time by a constant 19 seconds. If TAI was used, computers would never have to worry about leap seconds internally and things would be greatly simplified.

    Computers don't care what time is used internally, and it's easy for computers to get a table of leap seconds and use that data to display UTC to users so the displayed time matches solar time.

  16. Re:Dice: Please restore the Read More link. Thanks by Art3x · · Score: 5, Insightful

    I understand the desire to change things, but putting some social media Share link in place of the Read More link goes against the kind of website Slashdot is.

    Please restore the original layout. Thanks.

    +1 - Mod parent up.

    +2. In a Slashdot comment, we must add links and formatting by typing HTML by hand. You would therefore think we know how to copy and paste a web address from Slashdot to Facebook, if that's what we really want to do. We don't need an icon to do it for us.

    If you're going to add icons, switch the places for Share and Comments. Put the Share link to the right of the heading. Put the Comments link at the bottom. To me it seems more logical that way, it puts the Comments link back where it was.

  17. Re:Wrong solution, wrong problem by mcelrath · · Score: 4, Informative

    Also this is an awesome graph, and illustrates that the Earth is a horrible clock: https://upload.wikimedia.org/w...

    --
    1^2=1; (-1)^2=1; 1^2=(-1)^2; 1=-1; 1=0.
  18. Re:Buggy software is buggy by petermgreen · · Score: 4, Interesting

    Leap years and leap seconds are handled very differently.

    The rules for leap years are according to a forumula that has been fixed for hundreds of years. Computers typically handle them as part of their conversion from internal "time elapsed since epoch" data formats to "human" date formats and otherwise don't care much about them. Even the simplified formula of "leap year every 4 years"

    Leap seconds OTOH cannot be predicted in advance so you cannot realiablly convert "time elapsed since epoch including leap seconds" to "time elapsed since epoch excluding leap seconds" or "human datetime" for future datetimes and to do it for past datetimes requires an up to date list of leap seconds.

    Then there is the problem that "time elapsed since epoch excluding leap seconds" which is a common way to represent time (presumablly due to the difficulty in converting "time elapsed since the epoch including leap seconds" to "human datetime" simply cannot correctly represent the times arround a leap second.

    The testcase is also anything but simple, to test the code you have to inject fake leap seconds, but for a correct test leap seconds can only be injected at specific times (NTP for example increases it's update rate around possible leap seconds) so either you can only run the test at specific times or your entire test environment needs to run on "fake time". This is a big problem if your tests need to interact with a system outside the test environment in a way that depends on time within the test environment being in sync with time outside the test environment.

    --
    note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
  19. 1s > 128ms, therefore slew by Jizzbug · · Score: 2

    NTP would typically slew a 1-second difference, so Google is not out-of-line to add the second at the beginning of the day and slew their systems over the course of the day. Google uses lots of vector clocks in their distributed systems, they may have calculated that slewing over the course of the day introduces fewer time differences between machines than counting the final second twice (due to drift, which is inevitable on any NTP slave, corrected by "frequency discipline" and error estimates).

    --

    -=/\- Jizzbug -/\=-
  20. Re:How Will The Naval Observatory Clock Handle Thi by mbone · · Score: 2

    That's not the problem.

    Leap seconds are inserted by pretending that there's a 61st second in a minute.

    Pretend, nothing. Those minutes do have a 61st second.

  21. beware the Digg effect by ei4anb · · Score: 2

    When you change a forum against the wishes of the users you risk the Digg effect. Please undo the "Share" change.

  22. Re:Buggy software is buggy by at10u8 · · Score: 3, Informative

    The ITU-R has outlined 4 methods for the future of UTC. Methods A1, A2, B, C1, C2, and D are from various delegations of the international assembly, and they are in serious disagreement with each other.

  23. Re:We should do what GPS does by mbone · · Score: 2

    I recently took a private tour of the time and frequency lab at METAS (the Swiss Federal Institute of Metrology) and got to observe their atomic clocks, ask the people there some questions, etc.

    The scientist in charge of the lab wishes everyone would use TAI for time distribution. TAI has no leap seconds and differs from GPS time by a constant 19 seconds.

    Yes, because the Air Force people setting up GPS time didn't understand why that was a fundamental difference between UTC and TAI (GPS - UTC was zero when the time scale was established).

  24. Re:Dice: Please restore the Read More link. Thanks by GoodNewsJimDotCom · · Score: 4, Informative

    I thought Slashdot was dead. I thought they killed the comments until someone told me where to look.

  25. Re:Buggy software is buggy by 93+Escort+Wagon · · Score: 4, Insightful

    The ITU-R has outlined 4 methods for the future of UTC. Methods A1, A2, B, C1, C2, and D are from various delegations of the international assembly, and they are in serious disagreement with each other.

    That's silly. There's no reason for it. Let's just sit down and come up with a new standardized method that covers all of these use cases.

    --
    #DeleteChrome
  26. Re:Dice: Please restore the Read More link. Thanks by war4peace · · Score: 5, Insightful

    The way they changed the design is clickbait of sorts.
    People trained their muscle memory to click that area to load more of the story or comments. Now they click and yell in frustration.
    That's a really shitty way of luring people. Shame on you, Dice!

    --
    ...gis sdrawkcab (usually not responding to ACs; don't bother posting as AC)
  27. Re:Dice: Please restore the Read More link. Thanks by 93+Escort+Wagon · · Score: 2

    I understand the desire to change things, but putting some social media Share link in place of the Read More link goes against the kind of website Slashdot is.

    Not only that, but even though they've added a new numeric post count inside of a little speech bubble... if you click on that, you don't get taken to the comments! You still get taken to the top of the page, and have to scroll down to get to the comments.

    I realize Taco and the others are long gone, but doesn't anyone on the Slashdot staff even bother to look at the pages after a design change has been made?

    --
    #DeleteChrome
  28. Re:Doesn't matter, so why do it? by AthanasiusKircher · · Score: 2, Interesting

    Why do we even bother with this? Why can't we just let noon move a second. Even after a hundred years it won't make any difference. Time zones on average vary in the suns position by a whole hour so a 1 sec variation of the solar zenith makes no difference. Anstronomers will still be able to find there stars.

    Agreed. This is all nonsense. Even NIST admits that it's basically for legacy astronomical equipment. But any astronomer who needs real precision needs to deal with fractional-second corrections all the time now anyway, and there are published tables that allow one to do this. (For the current correction to convert from UTC to UT1, see here, which gives values accurate to +/-5 milliseconds.)

    If we ever got maybe a minute or more off, I could possibly see the reason for a correction. But a second? Who cares? As I said, the very small number of people who actually need to use UT1 mostly do fractional-second conversions all the time anyway, as leap seconds aren't precise enough to keep up with the continuous variation.

  29. Re:Buggy software is buggy by RavenLrD20k · · Score: 4, Funny

    The ITU-R has outlined 5 methods for the future of UTC [acma.gov.au]. Methods A1, A2, B, C1, C2, D, and E are from various delegations of the international assembly, and they are in serious disagreement with each other.

  30. Re:Buggy software is buggy by alexhs · · Score: 3, Informative

    The ITU-R has outlined 4 methods for the future of UTC

    Only method A1(*) proposes to redefine UTC. All other methods are keeping UTC just as it is.

    To sum up the methods :
    A1: No more leap seconds, UTC will drift from UT1.
    A2: Come up with a new name for "UTC without leap seconds" as the broadcast universal time, UTC becomes legacy.
    B: Keep UTC as it is, also broadcast a TAI-based reference time on an equal basis.
    C1: Keep UTC as it is, also broadcast a delta between UTC and TAI.
    C2: Same as C1, with more verbose recommendations.
    D: Keep UTC as it is.

    (*) With A2, UTC is not broadcasted anymore, so it has the same implications as A1, but mbone was going with the definition of UTC, so there's room for nitpicking :)

    --
    I have discovered a truly marvelous proof of killer sig, which this margin is too narrow to contain.
  31. Re:How Will The Naval Observatory Clock Handle Thi by Xylantiel · · Score: 2

    Exactly, just as February may have 28 or 29 days, the 23:59 minute may have 60 or 61 seconds. If your software time system was not built this way, it is technically wrong.

  32. The problem, and the IMHO correct solution. by arcade · · Score: 4, Interesting

    First off, the problem with leap seconds and unix is that unix time isn't UTC. Unix time is defined as seconds since epoch, ignoring leapseconds. Unix time is 'lossy' in that a the moment a leapsecond occurs can't be differentiated from the second before it. More information about that here: https://en.wikipedia.org/wiki/...

    The problem is that POSIX.1 is plain stupid when it comes to leapsecond.

    The correct solution to this problem would be as follows:
    1. Fix POSIX.1 to define unix time as TAI.
    2. Implement conversion routines i gettimeofday and other relevant functions.
    3. Use a handy store for leapseconds.

    Now, number 3 here is a bit tricky. Purists would probably want this in the TZ database or somesuch. This is well and good, but has the problem that the TZ files need to be packaged and updated on all the servers. If I remember correctly (please correct me if I'm wrong) Java is shipped with its own TZ files, and might also need them updated separately. Due to this, I think the most maintainable and portable way to do this across unixes would be to simply have an /etc/leapseconds file which lists the leapseconds since epoch. It does, however, depend on unix time being defined as TAI first.

    --
    "Rune Kristian Viken" - http://www.nwo.no - arca
    1. Re:The problem, and the IMHO correct solution. by at10u8 · · Score: 4, Informative

      Please look at this tzdist internet draft which is close to becoming an RFC. The tzdist protocol can communicate the list of leap seconds along with the list of time zones.

  33. alternatives by denbesten · · Score: 2

    There have been 35 leap seconds in the past 42 years. In very round numbers, we could have....

    1 leap millisecond 3 times per day,
    1 leap second every year or so,
    1 leap minute every 50 years or so,
    1 leap hour every 3000 years or so.

  34. Re:Dice: Please restore the Read More link. Thanks by weilawei · · Score: 5, Interesting

    I'm willing to accept that layouts change and I'll need to look in a new place--but the new location is actually terrible usability. Here's why:

    First, I read the headline. Then, I read the summary. I'm moving down the page, and I'm scrolling the page, too. So, now I'm at the end of the summary, and the headline for any story with a long summary is now out of the window. Now, I need to scroll back up to see how many comments or to click to view those comments. Extra work, even if the summary isn't long.

    Fitts' Law applies here. They've made the target smaller in diameter, and placed it further away effectively. That means the difficulty of clicking to view comments is noticeably harder.

  35. Chicken Little, the sky is not falling by msobkow · · Score: 2

    Every single time a leap second comes up in the future, we have these panic-stricken articles predicting doom and gloom for some services.

    If you haven't figured out how to deal with leap seconds that have been an issue since the '70s, I say your service DESERVES to crash and burn, and you DESERVE to spend long and stressful hours dealing with the mess.

    Leap seconds aren't a surprise to ANYONE with a functioning brain cell.

    --
    I do not fail; I succeed at finding out what does not work.
  36. Re:Doesn't matter, so why do it? by AthanasiusKircher · · Score: 2

    What's nonsense is locking civil time to atomic time. There would be no need for leap seconds if civil time simply remained linked to astronomical time, as it was for millenia.

    Sorry, but what the heck are you talking about? Your "solution" makes no sense given the need for accurate timekeeping today. Astronomical time varies significantly with the earth's rotation all the time by various amounts of milliseconds (see here for an illustration of that variance since modern UTC standards were adopted).

    The "length of a day" is simply nowhere near precise enough for modern applications. It worked to lock civil time to astronomical time when an error of a few milliseconds here and there wouldn't make a difference -- you could just reset all your clocks. But now much of our timekeeping software dealing with civil time works on machines where a few milliseconds here and there will screw things up all over the place.

    Are you at all aware of the mess things were before the modern UTC standards were adopted? They tried to make corrections on an order of milliseconds on a regular basis, and it was annoying as all hell. That's why they proposed only altering the standard clocks when the collective error accumulated to closer to a second -- the shift could then easily take place.

    What exactly do you think you're proposing here? That seconds will just be arbitrary lengths for civil time, varying on a daily or weekly basis to track the earth's variance in rotation? Or we keep the second constant, but that we make daily or weekly corrections somehow? Or what?

    Modern technology needs civil time to be consistent. And it needs to be precise because there are far to many machines which depend on it not varying by random little increments all the time. There are various ways of solving this problem, but just waving your hands and getting out your sundial to mark noon every day (as they did for millennia) is simply not possible in the modern world.